Fotia Ltd

Fotia Ltd

Saturday, 26 August 2006

The APPLY Operator

posted by Stefan Delmarco

We've come across the APPLY operator before when we were exploring DMVs. The APPLY operator has been added to the TSQL repertoire in SQL Server 2005. It fills an important gap with User-Defined Functions (UDFs).

UDFs are less well-known than stored procedures (SPs) and are often overlooked by inexperienced SQL developers. BOL has a good write-up on how to choose between SPs and UDFs. For the context of this article, a brief overview of UDFs:

UDFs come in two flavours: those that returns scalars and those that return tables. Scalar-valued UDFs are very straightforward. They take in a bunch of parameters and return a single scalar value. These are functions in the traditional VB / C# / C++ sense. Table-valued functions are a little different. They also can take in a bunch of parameters. However, they are able to return a table instead of a scalar value. This returned table is referenced in TSQL in the same manner as a table or a view. This makes them look like a cross between functions and views. In fact, there are two types of table-valued UDFs. Those that contain a single SELECT statement (they look very much like a derived table) called 'Inline Table-Valued Functions' and those that build-up the return table using procedural logic (they look like stored procedures) called 'Multistatement Table-Values Functions'. It is the latter-type that we are interested in.

Multistatement table-valued functions are very powerful as they allow us to encapsulate complex procedural logic in a function when that logic cannot be implemented in a database view (i.e. you cannot express that desired output using a single set-based TSQL expression). You are free to construct the output rows in what ever manner you wish! The only restriction is that you need to declare the schema of the table that the UDF returns upfront as part of the UDF's declaration.

As an example, consider the following table-valued UDF:

create function dbo.udf_DecomposeDns (
    @webServiceUrl nvarchar(1024))
returns @dnsDomains table (
    domain nvarchar(1024))
as begin
    -- Return no rows if NULL...
    if @webServiceUrl is NULL
        return;

    declare @dns nvarchar(1024);

    -- Remove the contents after the DNS...
    declare @slashIndex int;
    set @slashIndex = charindex(N'/', @webServiceUrl);
    if @slashIndex > 0
        set @dns = left(@webServiceUrl, @slashIndex - 1);
    else
        set @dns = @webServiceUrl;

    -- Decompose the DNS into its hierarchical parts...
    declare @dotIndex int;
    set @dotIndex = charindex(N'.', @dns);
    insert @dnsDomains (domain) values (@dns);

    while @dotIndex > 0 begin
        set @dns = right(@dns, len(@dns) - @dotIndex);
        insert @dnsDomains (domain) values (@dns);
        set @dotIndex = charindex(N'.', @dns);
    end;
    return;
end;

This UDF will decompose a URL into its hierarchical DNS domain constituents. For the sake of brevity the URL is assumed to consist of a DNS domain name followed by a URL-path (as in AdventureWorks' Purchasing.Vendor PurchasingWebServiceURL column). For example:

select *
from dbo.udf_DecomposeDns(N'www.extranet.fotia.com/order.asmx')

domain
-------------------------
www.extranet.fotia.co.uk
extranet.fotia.co.uk
fotia.co.uk
co.uk
uk

So far, all this is well within the capabilities of SQL Server 2000. Now, consider the following requirement: Generate a report that will contain the DNS composition of each Vendor's PurchasingWebServiceURL in AdventureWorks' Purchasing.Vendor table.

To achieve this we want to decompose each URL into its domain hierarchy. For each vendor we would expect multiple rows to be returned, equivalent to the number of parts in that Vendor's URL. This would not be possible to do in a single SELECT statement in SQL Server 2000. The SELECT syntax did not have an operator / syntax that would allow a table-valued UDF to be re-evaluated repeatedly for every row it was being joined to. We could only execute a table-valued UDF once in a SELECT statement. We could pass a parameter to the UDF but the parameter value could not be from any column in the resulting set. It could only be a constant value or a declared variable.

This is the gap that the APPLY operator fills in SQL Server 2005. It allows us to get the table-valued UDF to be executed for each row it is being joined to. In this manner the parameters to a table-valued UDF should be seen as the join criteria.

The APPLY operator can be used in two forms: CROSS APPLY and OUTER APPLY. To understand the difference think of the table-valued UDF as a normal table you join to. You'd either use an INNER or an OUTER join. CROSS APPLY is an INNER JOIN, and OUTER APPLY is an OUTER JOIN. If the UDF returns an empty result set for a specific row, that row will be excluded if CROSS APPLY is used. However, if OUTER APPLY is used that row will still be included. NULL values will be substituted for the missing UDF row (identical to a traditional OUTER JOIN). It's that simple.

CROSS APPLY over the Purchasing.Vendor table produces the following result:

select [Name], PurchasingWebServiceURL, domain
from Purchasing.Vendor
    cross apply dbo.udf_DecomposeDns(PurchasingWebServiceURL);

Name PurchasingWebServiceURL domain
------------------- ------------------------------ --------------------------
A. Datum Corporation www.adatum.com/ www.adatum.com
A. Datum Corporation www.adatum.com/ adatum.com
A. Datum Corporation www.adatum.com/ com
Trey Research www.treyresearch.net/ www.treyresearch.net
Trey Research www.treyresearch.net/ treyresearch.net
Trey Research www.treyresearch.net/ net
Wide World Importers www.wideworldimporters.com/ www.wideworldimporters.com
...

Compare the output with OUTER APPLY. Note the NULL rows substituted for the missing rows from the UDF:

select [Name], PurchasingWebServiceURL, domain
from Purchasing.Vendor
    outer apply dbo.udf_DecomposeDns(PurchasingWebServiceURL);

Name PurchasingWebServiceURL domain
---------------------------------- ------------------------ --------------------
International NULL NULL
Electronic Bike Repair & Supplies NULL NULL
Premier Sport, Inc. NULL NULL
Comfort Road Bicycles NULL NULL
Metro Sport Equipment NULL NULL
Green Lake Bike Company NULL NULL
Mountain Works NULL NULL
Continental Pro Cycles NULL NULL
A. Datum Corporation www.adatum.com/ www.adatum.com
A. Datum Corporation www.adatum.com/ adatum.com
A. Datum Corporation www.adatum.com/ com
Trey Research www.treyresearch.net/ www.treyresearch.net
Trey Research www.treyresearch.net/ treyresearch.net
Trey Research www.treyresearch.net/ net
Anderson's Custom Bikes NULL NULL
Compete, Inc. NULL NULL
...

We are able to bring the full might of TSQL to bear on table-valued UDFs. For example, aggregations can be run on the result set:

select domain, count(*) as [CountOf]
from Purchasing.Vendor
    outer apply dbo.udf_DecomposeDns(PurchasingWebServiceURL)
group by domain
order by 2 desc;

domain CountOf
----------------------------- -----------
NULL 98
com 5
litwareinc.com 1
net 1
northwindtraders.com 1
proseware.com 1
treyresearch.net 1
...

The APPLY operator is indeed a valuable addition to the TSQL. It allows a whole new modular and procedural approach to composing result sets.

Labels: ,

Previous Posts