Fotia Ltd

Fotia Ltd

Monday, 20 March 2006

The OVER Clause Part I

posted by Stefan Delmarco

The OVER clause is new to SQL Server 2005. It is used for aggregate (SUM, AVG, COUNT, etc.) and ranking (NTILE, ROW_NUMBER, RANK, etc.) functions. Previously we've seen it used to specify the sorting SQL Server needs to apply to a result set before evaluating its associated window function for each row. For example, the following query results a list of all vendors, their post codes and cities. Each row is assigned a ROW_NUMBER according to its absolute position when the rows are ordered according to their PostalCodes.

select row_number() over(order by PostalCode) as [RowNumber],
    [Name], PostalCode, City
from Person.Address A
    inner join Purchasing.VendorAddress VA
        on A.AddressId = VA.AddressId
    inner join Purchasing.Vendor V
        on VA.VendorId = V.VendorId
order by PostalCode;

RowNumber Name PostalCode City
--------- ---------------------------------- ---------- -------------
1 Premier Sport, Inc. 02113 Boston
2 A. Datum Corporation 10007 New York
3 Morgan Bike Accessories 12210 Albany
4 Wood Fitness 19107 Philadelphia
5 Competition Bike Training Systems 30308 Atlanta
6 Bike Satellite Inc. 37501 Downey
7 Midwest Sport, Inc. 48226 Detroit
...

Now, suppose I didn't want the row number to be consecutive over the entire result set. For example, I may want to have the rows numbered from 1 to n for each city. That is, I want the row numbers to be consecutive (1,2,3...n) per city. This is what the PARTITION BY clause allows us to do. By including a PARTITION BY clause in the window function's OVER clause we can essentially restart the function when the value of the column specified in the PARTITION BY changes. For example, to number the rows consecutively per city the query changes to:

select row_number() over(partition by City order by PostalCode) as [RowNumber],
    [Name], PostalCode, City
from Person.Address A
    inner join Purchasing.VendorAddress VA
        on A.AddressId = VA.AddressId
    inner join Purchasing.Vendor V
        on VA.VendorId = V.VendorId
order by City;
RowNumber Name PostalCode City
--------- --------------------------------- ----------- -----------
1 Morgan Bike Accessories 12210 Albany
2 Leaf River Terrain 97321 Albany
1 Allenson Cycles 91001 Altadena
2 Gardner Touring Cycles 91001 Altadena
1 Speed Corporation 98221 Anacortes
2 Electronic Bike Co. 98221 Anacortes
3 Northern Bike Travel 98221 Anacortes
1 Competition Bike Training Systems 30308 Atlanta
...

As you can see, the ROW_NUMBER function has restarted its numbering as soon as the City changed. The PARTITION BY clause can contain any number of columns available in the tables in the FROM clause. As PARTITION BY defines its arguments as an expression, we can start getting creative in how we perform the partitioning. For example, the following query partitions the row number according to the second letter of the address:

select row_number() over(partition by substring(City, 2, 1) order by PostalCode) as [RowNumber],
    [Name], PostalCode, City
from Person.Address A
    inner join Purchasing.VendorAddress VA
        on A.AddressId = VA.AddressId
    inner join Purchasing.Vendor V
        on VA.VendorId = V.VendorId
order by substring(City, 2, 1);

RowNumber Name PostalCode City
--------- --------------------------------- ----------- -----------
...
15 Bicycle Specialists 97034 Lake Oswego
16 Vista Road Bikes 97301 Salem
17 Knopfler Cycles 97301 Salem
18 Recreation Place 97301 Salem
19 Reliance Fitness, Inc. 98107 Ballard
20 Electronic Bike Repair & Supplies 98403 Tacoma
21 Cycling Master 99362 Walla Walla
1 A. Datum Corporation 10007 New York
2 Midwest Sport, Inc. 48226 Detroit
3 Green Lake Bike Company 80203 Denver
4 Beaumont Bikes 83301 West Covina
5 Greenwood Athletic Company 85252 Lemon Grove
...

ROW_NUMBER restarted its numbering as soon as the second letter of the City name changed from 'a' to 'e'. All of the above applies equally to RANK / DENSE_RANK and NTILE. The OVER clause can also be used with aggregate functions (like MIN, MAX, AVG, SUM, etc) which makes mixing aggregations and non-aggregations in a single query a lot easier than before. That's what we'll be looking at next.

Labels: ,

Previous Posts