Saturday, 26 August 2006
The APPLY Operator
posted by Stefan Delmarco
We've come across the APPLY operator before when we were exploring DMVs. The APPLY operator has been added to the TSQL repertoire in SQL Server 2005. It fills an important gap with User-Defined Functions (UDFs).
UDFs are less well-known than stored procedures (SPs) and are often overlooked by inexperienced SQL developers. BOL has a good write-up on how to choose between SPs and UDFs. For the context of this article, a brief overview of UDFs:
UDFs come in two flavours: those that returns scalars and those that return tables. Scalar-valued UDFs are very straightforward. They take in a bunch of parameters and return a single scalar value. These are functions in the traditional VB / C# / C++ sense. Table-valued functions are a little different. They also can take in a bunch of parameters. However, they are able to return a table instead of a scalar value. This returned table is referenced in TSQL in the same manner as a table or a view. This makes them look like a cross between functions and views. In fact, there are two types of table-valued UDFs. Those that contain a single SELECT statement (they look very much like a derived table) called 'Inline Table-Valued Functions' and those that build-up the return table using procedural logic (they look like stored procedures) called 'Multistatement Table-Values Functions'. It is the latter-type that we are interested in.
Multistatement table-valued functions are very powerful as they allow us to encapsulate complex procedural logic in a function when that logic cannot be implemented in a database view (i.e. you cannot express that desired output using a single set-based TSQL expression). You are free to construct the output rows in what ever manner you wish! The only restriction is that you need to declare the schema of the table that the UDF returns upfront as part of the UDF's declaration.
As an example, consider the following table-valued UDF:
@webServiceUrl nvarchar(1024))
returns @dnsDomains table (
domain nvarchar(1024))
as begin
-- Return no rows if NULL...
if @webServiceUrl is NULL
return;
declare @dns nvarchar(1024);
-- Remove the contents after the DNS...
declare @slashIndex int;
set @slashIndex = charindex(N'/', @webServiceUrl);
if @slashIndex > 0
set @dns = left(@webServiceUrl, @slashIndex - 1);
else
set @dns = @webServiceUrl;
-- Decompose the DNS into its hierarchical parts...
declare @dotIndex int;
set @dotIndex = charindex(N'.', @dns);
insert @dnsDomains (domain) values (@dns);
while @dotIndex > 0 begin
set @dns = right(@dns, len(@dns) - @dotIndex);
insert @dnsDomains (domain) values (@dns);
set @dotIndex = charindex(N'.', @dns);
end;
return;
end;
This UDF will decompose a URL into its hierarchical DNS domain constituents. For the sake of brevity the URL is assumed to consist of a DNS domain name followed by a URL-path (as in AdventureWorks' Purchasing.Vendor PurchasingWebServiceURL column). For example:
from dbo.udf_DecomposeDns(N'www.extranet.fotia.com/order.asmx')
| domain |
| ------------------------- |
| www.extranet.fotia.co.uk |
| extranet.fotia.co.uk |
| fotia.co.uk |
| co.uk |
| uk |
So far, all this is well within the capabilities of SQL Server 2000. Now, consider the following requirement: Generate a report that will contain the DNS composition of each Vendor's PurchasingWebServiceURL in AdventureWorks' Purchasing.Vendor table.
To achieve this we want to decompose each URL into its domain hierarchy. For each vendor we would expect multiple rows to be returned, equivalent to the number of parts in that Vendor's URL. This would not be possible to do in a single SELECT statement in SQL Server 2000. The SELECT syntax did not have an operator / syntax that would allow a table-valued UDF to be re-evaluated repeatedly for every row it was being joined to. We could only execute a table-valued UDF once in a SELECT statement. We could pass a parameter to the UDF but the parameter value could not be from any column in the resulting set. It could only be a constant value or a declared variable.
This is the gap that the APPLY operator fills in SQL Server 2005. It allows us to get the table-valued UDF to be executed for each row it is being joined to. In this manner the parameters to a table-valued UDF should be seen as the join criteria.
The APPLY operator can be used in two forms: CROSS APPLY and OUTER APPLY. To understand the difference think of the table-valued UDF as a normal table you join to. You'd either use an INNER or an OUTER join. CROSS APPLY is an INNER JOIN, and OUTER APPLY is an OUTER JOIN. If the UDF returns an empty result set for a specific row, that row will be excluded if CROSS APPLY is used. However, if OUTER APPLY is used that row will still be included. NULL values will be substituted for the missing UDF row (identical to a traditional OUTER JOIN). It's that simple.
CROSS APPLY over the Purchasing.Vendor table produces the following result:
from Purchasing.Vendor
cross apply dbo.udf_DecomposeDns(PurchasingWebServiceURL);
| Name | PurchasingWebServiceURL | domain |
| ------------------- | ------------------------------ | -------------------------- |
| A. Datum Corporation | www.adatum.com/ | www.adatum.com |
| A. Datum Corporation | www.adatum.com/ | adatum.com |
| A. Datum Corporation | www.adatum.com/ | com |
| Trey Research | www.treyresearch.net/ | www.treyresearch.net |
| Trey Research | www.treyresearch.net/ | treyresearch.net |
| Trey Research | www.treyresearch.net/ | net |
| Wide World Importers | www.wideworldimporters.com/ | www.wideworldimporters.com |
| ... |
Compare the output with OUTER APPLY. Note the NULL rows substituted for the missing rows from the UDF:
from Purchasing.Vendor
outer apply dbo.udf_DecomposeDns(PurchasingWebServiceURL);
| Name | PurchasingWebServiceURL | domain |
| ---------------------------------- | ------------------------ | -------------------- |
| International | NULL | NULL |
| Electronic Bike Repair & Supplies | NULL | NULL |
| Premier Sport, Inc. | NULL | NULL |
| Comfort Road Bicycles | NULL | NULL |
| Metro Sport Equipment | NULL | NULL |
| Green Lake Bike Company | NULL | NULL |
| Mountain Works | NULL | NULL |
| Continental Pro Cycles | NULL | NULL |
| A. Datum Corporation | www.adatum.com/ | www.adatum.com |
| A. Datum Corporation | www.adatum.com/ | adatum.com |
| A. Datum Corporation | www.adatum.com/ | com |
| Trey Research | www.treyresearch.net/ | www.treyresearch.net |
| Trey Research | www.treyresearch.net/ | treyresearch.net |
| Trey Research | www.treyresearch.net/ | net |
| Anderson's Custom Bikes | NULL | NULL |
| Compete, Inc. | NULL | NULL |
| ... |
We are able to bring the full might of TSQL to bear on table-valued UDFs. For example, aggregations can be run on the result set:
from Purchasing.Vendor
outer apply dbo.udf_DecomposeDns(PurchasingWebServiceURL)
group by domain
order by 2 desc;
| domain | CountOf |
| ----------------------------- | ----------- |
| NULL | 98 |
| com | 5 |
| litwareinc.com | 1 |
| net | 1 |
| northwindtraders.com | 1 |
| proseware.com | 1 |
| treyresearch.net | 1 |
| ... |
The APPLY operator is indeed a valuable addition to the TSQL. It allows a whole new modular and procedural approach to composing result sets.
