Monday, 20 March 2006
The OVER Clause Part I
posted by Stefan Delmarco
The OVER clause is new to SQL Server 2005. It is used for aggregate (SUM, AVG, COUNT, etc.) and ranking (NTILE, ROW_NUMBER, RANK, etc.) functions. Previously we've seen it used to specify the sorting SQL Server needs to apply to a result set before evaluating its associated window function for each row. For example, the following query results a list of all vendors, their post codes and cities. Each row is assigned a ROW_NUMBER according to its absolute position when the rows are ordered according to their PostalCodes.
[Name], PostalCode, City
from Person.Address A
inner join Purchasing.VendorAddress VA
on A.AddressId = VA.AddressId
inner join Purchasing.Vendor V
on VA.VendorId = V.VendorId
order by PostalCode;
| RowNumber | Name | PostalCode | City |
| --------- | ---------------------------------- | ---------- | ------------- |
| 1 | Premier Sport, Inc. | 02113 | Boston |
| 2 | A. Datum Corporation | 10007 | New York |
| 3 | Morgan Bike Accessories | 12210 | Albany |
| 4 | Wood Fitness | 19107 | Philadelphia |
| 5 | Competition Bike Training Systems | 30308 | Atlanta |
| 6 | Bike Satellite Inc. | 37501 | Downey |
| 7 | Midwest Sport, Inc. | 48226 | Detroit |
| ... |
Now, suppose I didn't want the row number to be consecutive over the entire result set. For example, I may want to have the rows numbered from 1 to n for each city. That is, I want the row numbers to be consecutive (1,2,3...n) per city. This is what the PARTITION BY clause allows us to do. By including a PARTITION BY clause in the window function's OVER clause we can essentially restart the function when the value of the column specified in the PARTITION BY changes. For example, to number the rows consecutively per city the query changes to:
[Name], PostalCode, City
from Person.Address A
inner join Purchasing.VendorAddress VA
on A.AddressId = VA.AddressId
inner join Purchasing.Vendor V
on VA.VendorId = V.VendorId
order by City;
| RowNumber | Name | PostalCode | City |
| --------- | --------------------------------- | ----------- | ----------- |
| 1 | Morgan Bike Accessories | 12210 | Albany |
| 2 | Leaf River Terrain | 97321 | Albany |
| 1 | Allenson Cycles | 91001 | Altadena |
| 2 | Gardner Touring Cycles | 91001 | Altadena |
| 1 | Speed Corporation | 98221 | Anacortes |
| 2 | Electronic Bike Co. | 98221 | Anacortes |
| 3 | Northern Bike Travel | 98221 | Anacortes |
| 1 | Competition Bike Training Systems | 30308 | Atlanta |
| ... |
As you can see, the ROW_NUMBER function has restarted its numbering as soon as the City changed. The PARTITION BY clause can contain any number of columns available in the tables in the FROM clause. As PARTITION BY defines its arguments as an expression, we can start getting creative in how we perform the partitioning. For example, the following query partitions the row number according to the second letter of the address:
[Name], PostalCode, City
from Person.Address A
inner join Purchasing.VendorAddress VA
on A.AddressId = VA.AddressId
inner join Purchasing.Vendor V
on VA.VendorId = V.VendorId
order by substring(City, 2, 1);
| RowNumber | Name | PostalCode | City |
| --------- | --------------------------------- | ----------- | ----------- |
| ... | |||
| 15 | Bicycle Specialists | 97034 | Lake Oswego |
| 16 | Vista Road Bikes | 97301 | Salem |
| 17 | Knopfler Cycles | 97301 | Salem |
| 18 | Recreation Place | 97301 | Salem |
| 19 | Reliance Fitness, Inc. | 98107 | Ballard |
| 20 | Electronic Bike Repair & Supplies | 98403 | Tacoma |
| 21 | Cycling Master | 99362 | Walla Walla |
| 1 | A. Datum Corporation | 10007 | New York |
| 2 | Midwest Sport, Inc. | 48226 | Detroit |
| 3 | Green Lake Bike Company | 80203 | Denver |
| 4 | Beaumont Bikes | 83301 | West Covina |
| 5 | Greenwood Athletic Company | 85252 | Lemon Grove |
| ... |
ROW_NUMBER restarted its numbering as soon as the second letter of the City name changed from 'a' to 'e'. All of the above applies equally to RANK / DENSE_RANK and NTILE. The OVER clause can also be used with aggregate functions (like MIN, MAX, AVG, SUM, etc) which makes mixing aggregations and non-aggregations in a single query a lot easier than before. That's what we'll be looking at next.
