Fotia Ltd

Fotia Ltd

Saturday, 18 February 2006

The ROW_NUMBER Ranking Function

posted by Stefan Delmarco

Given the simplicity of this function it is surprising that we've never had it before! All that this function does is label each row with a monotonically increasing row number. Think of it as a dynamic IDENTITY column.

This function falls within the new ranking functions we started covering in the previous article. Just like NTILE, the ROW_NUMBER function needs an OVER clause that contains an ORDER BY. Each row is given a number according to its absolute position in the order specified in the OVER clause.

For example, the following query returns all First / LastNames where the LastName begins with the letter 'D'. The first row will have a RowNumber value of 1, the next 2, and so forth:

select row_number() over(order by LastName) as [RowNumber],
    FirstName, LastName
from Person.Contact
where LastName like 'D%';

Whilst this may seem very straight forward, the real intention of this function is to make it easy to implement paging in the database. That is, the ability to return a portion of the result set as specified by the number of the first row to return, and a page size (a'la Google-style). Just like the NTILE function, the ROW_NUMBER function cannot be used in the WHERE clause. To filter on ROW_NUMBER a derived table needs to be used:

declare @startRow int;
declare @pageSize int;
set @startRow = 91;
set @pageSize = 10;

select FirstName, LastName
from (
    select row_number() over(order by LastName) as [RowNumber],
        FirstName, LastName
    from Person.Contact
    where LastName like 'D%') as Derived
where RowNumber between @startRow
and @startRow + @pageSize - 1;

As a test I wanted to see how clever the query optimizer would be at restricting the number of rows returned. Typically you page data over an enormous results set (think of the contents of a telephone directory). I wanted to make sure that the query optimizer didn't assign a ROW_NUMBER to every single row in the underlying result set that is being paged. The first time I ran the query I found that the execution plan first performed a clustered index scan of the entire Person.Conact table to find all LastNames that begin with D (there are 556 LastNames that begin with 'D'):

Execute Plan Before

This is very inefficient as the optimizer has to chew through 556 rows just to return 10! However, once I added a non-clustered index on the LastName column, the change in the query plan was astounding!

create nonclustered index IX_Contact_LastName
on Person.Contact (LastName asc)

Instead of blindly following the sub-select derived table filter that the first execution plan produced, the revised execution plan consisted of two distinct parts. The first part performed an index seek (using IX_Contact_LastName) to find the first 100 rows in the Person.Contact table that began with the letter 'D'. 100...? Well, we specified a @startRow of 91 and @pageSize of 10. So SQL Server is searching for the last row in the result set where the LastName begins with 'D', ordered by LastName (as we specified in the OVER clause).

Execution Plan After

The second part of the query is where SQL Server joins the rows that fall within our criteria (from part 1) back to the table (part 2 which uses the clustered PK_Contact_ContactID index) to retrieve the FirstNames for the LastNames we are returning. This is brilliant! SQL Server has understood what the query is trying to perform holistically. It has restricted the number of rows to evaluate to the last row the query is interested in instead of the entire result set! Under the skin the query optimiser has implemented a very clever TOP 100 clause.

So, the ROW_NUMBER function is extremely easy to use. However, care still needs to be taken when deciding how to order and filter the rows to be returned, as a simple query does not necessarily equate to an efficient execution plan. Just because you can page easily don't forget to make sure you have the underlying database structures to perform that paging efficiently. The hard-learned rules of query optimisation through indexing are still very relevant.

Labels: ,

Previous Posts