T-SQL APPLY operator made simple
Introduction with examples to the T-SQL APPLY operator
Back in 2005 Microsoft introduced in SQL Server the APPLY operator, which is basically like a join clause and it allow us to join two table expressions. It is usually used to invoke a table-valued function for each row returned by a query. The APPLY operator can be used in two ways:
- CROSS APPLY: Returns only the rows from the left table expression if it matches with the right table expression.
- OUTER APPLY: Returns all rows from the left table expression. The rows from the left table expression that match the value of the right table expression have a value and the others are NULL.
So CROSS APPLY is pretty similar to an INNER JOIN and the OUTER APPLY is pretty similar to a LEFT OUTER JOIN. Note that you can use a table-valued expression on the right part only with APPLY and not the JOIN.
For example, suppose we have a customer
table, a product
table and an order
table like follows:
USE master;
DROP DATABASE IF EXISTS apply_operator_example;
CREATE DATABASE apply_operator_example;
USE apply_operator_example;
-- DROP CREATE tables
DROP TABLE IF EXISTS dbo.customer;
CREATE TABLE dbo.customer
(
[key] INT IDENTITY(1, 1) NOT NULL
, [id] NVARCHAR(32) NOT NULL
, [name] NVARCHAR(128) NULL
, CONSTRAINT [PK_customer_key] PRIMARY KEY ([key])
, CONSTRAINT [AK_customer_id] UNIQUE ([id])
)
;
DROP TABLE IF EXISTS dbo.product;
CREATE TABLE dbo.product
(
[key] INT IDENTITY(1, 1) NOT NULL
, [id] NVARCHAR(32) NOT NULL
, [name] NVARCHAR(128) NULL
, CONSTRAINT [PK_product_key] PRIMARY KEY ([key])
, CONSTRAINT [AK_product_id] UNIQUE ([id])
)
;
DROP TABLE IF EXISTS dbo.[order];
CREATE TABLE dbo.[order]
(
[key] INT IDENTITY(1, 1) NOT NULL
, [customer_key] INT NOT NULL
, [product_key] INT NOT NULL
, [quantity] INT NULL
, [insert_timestamp] DATETIME NULL CONSTRAINT [DF_order_insert_timestamp] DEFAULT GETUTCDATE()
, CONSTRAINT [PK_order_key] PRIMARY KEY ([key])
, CONSTRAINT [FK_order_customer_key] FOREIGN KEY ([customer_key]) REFERENCES customer([key])
, CONSTRAINT [FK_order_product_key] FOREIGN KEY ([product_key]) REFERENCES product([key])
)
;
Let’s populate these tables with some data:
-- INSERT sample data
INSERT INTO dbo.customer ([id], [name]) VALUES
('c1', 'Pippo')
, ('c2', 'Pluto')
, ('c3', 'Paperino')
, ('c4', 'Topolino')
;
INSERT INTO dbo.product ([id], [name]) VALUES
('p1', 'Laser pointer')
, ('p2', 'Nutcracker')
, ('p3', 'Banana')
, ('p4', 'Book')
, ('p5', 'Lamp')
;
INSERT INTO dbo.[order] ([customer_key], [product_key], [quantity]) VALUES
(2, 1, 2)
, (3, 1, 1)
;
WAITFOR DELAY '00:00:05';
INSERT INTO dbo.[order] ([customer_key], [product_key], [quantity]) VALUES
(1, 2, 3)
, (2, 2, 6)
, (3, 2, 10)
;
WAITFOR DELAY '00:00:02';
INSERT INTO dbo.[order] ([customer_key], [product_key], [quantity]) VALUES
(1, 3, 12)
, (3, 3, 12)
, (3, 4, 13)
;
Let’s check the data:
-- Check data
SELECT *
FROM dbo.customer
;
SELECT *
FROM dbo.product
;
SELECT *
FROM dbo.[order]
;
All data together:
SELECT c.[key] AS [customer_key]
, c.[id] AS [customer_id]
, c.[name] AS [customer_name]
, o.[key] AS [order_key]
, o.[quantity]
, o.[insert_timestamp]
, p.[key] AS [product_key]
, p.[id] AS [product_id]
, p.[name] AS [product_name]
FROM dbo.customer AS c
FULL OUTER JOIN dbo.[order] AS o
ON c.[key] = o.[customer_key]
FULL OUTER JOIN dbo.product AS p
ON o.[product_key] = p.[key]
;
Now let’s query the data and see how APPLY operator works. CROSS APPLY and INNER JOIN gives us the same results:
-- INNER JOIN vs CROSS APPLY
SELECT *
FROM dbo.customer AS c
INNER JOIN dbo.[order] AS o
ON c.[key] = o.[customer_key]
;
SELECT *
FROM dbo.customer AS c
CROSS APPLY
(
SELECT *
FROM dbo.[order] AS o
WHERE c.[key] = o.[customer_key]
) A
;
Also the execution plans for these queries are the same:
As expected also OUTER APPLY and LEFT OUTER JOIN gives us the same results:
-- LEFT OUTER JOIN vs OUTER APPLY
SELECT *
FROM dbo.customer AS c
LEFT OUTER JOIN dbo.[order] AS o
ON c.[key] = o.[customer_key]
;
SELECT *
FROM dbo.customer AS c
OUTER APPLY
(
SELECT *
FROM dbo.[order] AS o
WHERE c.[key] = o.[customer_key]
) A
;
Also the execution plans for these queries are pretty similar:
Example 1
We want to retrieve the customers information along with their latest order details. Using CROSS APPLY the query is easier to read but it is slower:
-- Retrive customer information along with their latest order details
SELECT c.[id] AS [customer_id]
, c.[name] AS [customer_name]
, a.[product_key]
, a.[quantity]
, a.[insert_timestamp]
FROM dbo.customer AS c
INNER JOIN
(
SELECT o.[customer_key]
, o.[product_key]
, o.[quantity]
, o.[insert_timestamp]
, ROW_NUMBER() OVER (PARTITION BY o.customer_key ORDER BY o.insert_timestamp DESC) AS rn
FROM dbo.[order] AS o
) AS a
ON c.[key] = a.[customer_key]
AND a.rn = 1
;
SELECT c.[id] AS [customer_id]
, c.[name] AS [customer_name]
, a.[product_key]
, a.[quantity]
, a.[insert_timestamp]
FROM dbo.customer AS c
CROSS APPLY
(
SELECT TOP 1 o.[product_key]
, o.[quantity]
, o.[insert_timestamp]
FROM dbo.[order] AS o
WHERE o.customer_key = c.[key]
ORDER BY o.insert_timestamp DESC
) AS a
;
In this example, we are using CROSS APPLY to apply a subquery to each row of the customer
table. The subquery returns the latest order details for each customer, which are then combined with the customer information.
Example 2
Same as before but let’s say that we have to use a function dbo.fn_GetLastOrderFromCustomer
which takes a customer key as input and returns the latest order for that customer, based on the insert_timestamp
column:
-- DROP CREATE function to retrive latest order details for a customer
DROP FUNCTION IF EXISTS dbo.fn_GetLastOrderFromCustomer;
GO
CREATE FUNCTION dbo.fn_GetLastOrderFromCustomer (@customer_key AS INT)
RETURNS TABLE
AS
RETURN
(
SELECT TOP 1 o.[product_key]
, o.[quantity]
, o.[insert_timestamp]
FROM dbo.[order] AS o
WHERE o.[customer_key] = @customer_key
ORDER BY o.insert_timestamp DESC
)
;
GO
Example of function execution:
SELECT * FROM dbo.fn_GetLastOrderFromCustomer(1);
Of course the result are the same as before but we are leveraging the APPLY capability to use functions:
Example 3
We want to retrieve a list of all customers and their associated products, but only for the products that have been ordered at least two times. We can use the APPLY operator to apply a subquery that counts the number of time a product has been ordered:
-- Retrieve a list of all customers and their associated products, but only for the products that have at least two customers assigned to them
SELECT c.[name] AS [client_name]
, p.[name] AS [product_name]
FROM dbo.customer AS c
INNER JOIN dbo.[order] AS o
ON c.[key] = o.[customer_key]
INNER JOIN dbo.product AS p
ON o.[product_key] = p.[key]
CROSS APPLY
(
SELECT COUNT(*) AS num_orders
FROM dbo.[order] AS o2
WHERE o2.[product_key] = o.[product_key]
) AS a
WHERE a.num_orders >= 2
;
In this query, the subquery within the CROSS APPLY operator counts the number of times a product has been ordered, based on the product_key
. The main query then joins the customer
, order
, and product
tables together, and applies the subquery for each row. The WHERE clause filters out rows where the product has been ordered fewer than two times.
Note that without the APPLY operator, we would have to use a GROUP BY clause with a HAVING condition to achieve something similar but this would not return the individual customer and product rows, but instead aggregate the results by product. The APPLY operator allows us to count the customers per product while still returning individual customer and product rows.
In conclusion, APPLY is a powerful tool that can be used to join tables based on a user-defined function. It is particularly useful for complex queries that involve joining multiple tables or performing calculations on table columns. Here a super interesting blog post by Arshad Ali.
Outro
I hope the story was interesting and thank you for taking the time to read it. On my Blogspot you can find the same post in Italian. Let me know if you have any question and if you like the content that I create feel free to buy me a coffee.