Fotia Ltd

Fotia Ltd

Sunday, 22 January 2006

The NTILE Ranking Function

posted by Stefan Delmarco

NTILE, RANK, DENSE_RANK and ROW_NUMBER form part of a new set of functions, called 'Ranking Functions' (or 'Ranking Window Functions') that use the new OVER operator. We'll just cover NTILE in this instalment and work our way through the rest in subsequent articles.

These functions have been added to TSQL to make features like paging and categorising data much easier. Anyone that's tried to do Google-style paging in TSQL knows that this can be a bit challenging when data sets get large. There are a couple of neat tricks in SQL Server 2000 to do this efficiently but with SQL Server 2005 this is now a walk-in-the-park.

The ranking functions all return integers whose values and ordering are determined by the context supplied to the OVER operator. The OVER clause determines the ordering the function needs to apply to the result set when it 'ranks' (i.e. evaluates) each row. Typically the OVER clause contains an ORDER BY statement but can include a PARTITION as well (which we'll ignore for now but cover in a later instalment).

Use of NTILE and OVER is best illustrated with an example. Suppose I wanted to return a result set and have the the result set split into equally-sized logical groups. That is, I want a number associated with each row that tells me which group a specific row belongs in (1..n where n is the number of group). I don't want to specify a page size for these group. I just want the rows divided equally.

declare @numberOfGroups int;
set @numberOfGroups = 5;

select ntile(@numberOfGroups) over(order by FirstName) as [GroupNumber],
    FirstName, LastName
from Person.Contact
where LastName like 'D%';

When you run this query you'll see that SQL Server has returned a number in the first column that represents the group number of that row. SQL Server has taken the total number of rows returned from the query and divided it into the specified number of groups (which could have been hardcoded) and then numbered the groups from 1 to n, where n is the @numberOfGroups. If the number of rows cannot be divided equally amongst the number of groups the lower numbered groups will have more rows in them (at most 1 more).

The OVER clause determines how SQL Server orders the rows when it divides them into groups. You can perform the grouping in one order and return the result set in another one. For example:

declare @numberOfGroups int;
set @numberOfGroups = 5;

select ntile(@numberOfGroups) over(order by FirstName) as [GroupNumber],
    FirstName, LastName
from Person.Contact
where LastName like 'D%'
order by LastName desc;

In this case the GroupNumber column will appear to be jumbled as SQL Server has assigned the rows into groups according to FirstName but then sorted the rows on LastName before returning the result set. If you just want to return a specific group number you need to use a derived table (or Common Table Expression / CTE - stay tuned). Windowed functions are not allowed to appear in the WHERE clause as this would be ambiguous (does SQL do the grouping before / after / during all the joins and filters...?)

declare @numberOfGroups int;
set @numberOfGroups = 5;
declare @groupToReturn int;
set @groupToReturn = 3;

select FirstName, LastName
from (
    select ntile(@numberOfGroups) over(order by FirstName) as [GroupNumber],
        FirstName, LastName
    from Person.Contact
    where LastName like 'D%') as Grouped
where GroupNumber = @groupToReturn;

This query just returns the third group - very simple paging / chunking. Next we'll have a look at ROW_NUMBER(), the single most requested feature after SNAPSHOT_ISOLATION!

Labels: ,

Previous Posts