Fotia Ltd

Fotia Ltd

Thursday, 9 March 2006

The RANK and DENSE_RANK Ranking Functions

posted by Stefan Delmarco

RANK and DENSE_RANK are the last of the new ranking (window) functions. The best analogy of how these functions work is to think of how you'd solve the following problem: You've been asked to write a report that will generate a list of the best selling products. You're required to produce a result set that contains the product name, total quantity sold and assign each product a ranking (i.e. 1st, 2nd, 3rd, etc.). So you might write something like this:

select P.[Name],
    sum(TH.Quantity) as [TotalQuantity],
    row_number() over(order by sum(quantity) desc) as [Ranking]
from Production.TransactionHistory TH
    inner join Production.Product P
        on TH.ProductID = P.ProductID
group by P.[Name]
order by [TotalQuantity] desc;

Name TotalQuantity Ranking
----------------- ------------- --------------------
BB Ball Bearing 374090 1
Seat Stays 187376 2
Blade 93688 3
Chain Stays 93688 4
Fork End 93688 5
HL Crankarm 64900 6
HL Hub 56264 7
Decal 1 56250 8
Decal 2 56250 9
LL Mountain Pedal 50922 10
...

Whilst this looks to have fulfilled the requirements, it has a very subtle deficiency. If you look closely you'll see that the top 10 products actually include a couple of ties. Blade, Chain Stays, and Fork End all come in at an equal 93,688 sold. By using ROW_NUMBER to assign the ranking we have failed to take account the possibility of ties and assigned lower rankings to products that have sold equal amounts. Imagine if this query was for assigning prize money in a golf tournament! I'm sure the competitors would not be very happy about this 'feature'!

This is where the RANK function comes into play. In the same manner as all the previous windowed functions we looked at, RANK requires an OVER clause. RANK uses the ORDER BY in the OVER clause to assign a monotonically increasing value to each row, starting from 1. However, RANK looks for duplicates in the range. Where duplicate rows are found, the same RANK value is assigned to each row. In addition, when parsing duplicates, RANK counts the number of duplicates read and assigns the next row, that has a differing value, a RANK equal to the RANK of the duplicates plus the number of duplicates read. Using RANK our query then becomes:

select P.[Name],
    sum(TH.Quantity) as [TotalQuantity],
    rank() over(order by sum(quantity) desc) as [Ranking]
from Production.TransactionHistory TH
    inner join Production.Product P
        on TH.ProductID = P.ProductID
group by P.[Name]
order by [TotalQuantity] desc;

Name TotalQuantity Ranking
----------------- ------------- --------------------
BB Ball Bearing 374090 1
Seat Stays 187376 2
Blade 93688 3
Chain Stays 93688 3
Fork End 93688 3
HL Crankarm 64900 6
HL Hub 56264 7
Decal 1 56250 8
Decal 2 56250 8
LL Mountain Pedal 50922 10
...

As you can see, RANK has correctly identified the 'ties' and assigned Blade, Chain Stays and Fork End an identical RANK. In addition, it has correctly assigned HL Crankarm a RANK of sixth as we have 3 products that came in third.

DENSE_RANK behaves identical to RANK except that it does not increment the rank each time a duplicate is read. Using RANK we have no fourth or fifth placed products as the 3 third placed products have hidden the fourth and fifth placed ranks (i.e. we have non-consecutive ranks). DENSE_RANK does not skip ranks in this manner. Changing our query to use DENSE_RANK becomes:

select P.[Name],
    sum(TH.Quantity) as [TotalQuantity],
    dense_rank() over(order by sum(quantity) desc) as [Ranking]
from Production.TransactionHistory TH
    inner join Production.Product P
        on TH.ProductID = P.ProductID
group by P.Name
order by [TotalQuantity] desc;

Name TotalQuantity Ranking
------------------ ------------- --------------------
BB Ball Bearing 374090 1
Seat Stays 187376 2
Blade 93688 3
Chain Stays 93688 3
Fork End 93688 3
HL Crankarm 64900 4
HL Hub 56264 5
Decal 1 56250 6
Decal 2 56250 6
LL Mountain Pedal 50922 7
...

Next we'll explore the OVER clause in a bit more detail to see how partitions work.

Labels: ,

Previous Posts