Friday, 13 October 2006
The UNPIVOT Operator
posted by Stefan Delmarco
Completing the discussion on the PIVOT operator is its antonym, the UNPIVOT. However, I would argue that the UNPIVOT operator has been misnamed by Microsoft as it creates the mistaken impression that PIVOT -> UNPIVOT -> PIVOT gets you back to where you started. From BOL:
UNPIVOT performs the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.
It fails to highlight that fact that PIVOT has aggregated the raw data in order to rotate the underlying rows into columns - you have to read to the very end of the Using PIVOT and UNPIVOT sample for the fine print in BOL. In contrast, the UNPIVOT operator does not perform any aggregation whatsoever. It just turns columns into rows. For each column in the table being UNPIVOT'd you get a row in the result set where the row values are ColumnName and ColumnValue. It therefore helps to think of transpose instead of UNPIVOT as this is more accurate description of the UNPIVOT operator's behaviour.
Nevertheless, let's have a look at the syntax to try and get a better feel for what UNPIVOT brings to the table. As a worked example, consider the following requirement: Write a query to produce a result set that contains a list of products (Production.Product) and their SellStartDate and SellEndDate. The SellStartDate and SellEndDate need to be listed in a single column with a second column specifying whether it is the SellStartDate or SellEndDate value being specified. Lastly, the result set needs to be sorted by this consolidated date column to produce a linear timeline of 'product events'.
In SQL Server 2000, you'd probably end up with a query that looks very much like the following:
from Production.Product
union all
select ProductId, [Name], 'SellEndDate' as DateType, SellEndDate as DateValue
from Production.Product
where SellEndDate is not null
order by DateValue;
| ProductId | Name | DateType | DateValue |
| ----------- | ----------- | ----------- | ----------- |
| ... | |||
| 320 | Chainring Bolts | SellStartDate | 1998-06-01 00:00:00.000 |
| 321 | Chainring Nut | SellStartDate | 1998-06-01 00:00:00.000 |
| 322 | Chainring | SellStartDate | 1998-06-01 00:00:00.000 |
| 323 | Crown Race | SellStartDate | 1998-06-01 00:00:00.000 |
| 324 | Chain Stays | SellStartDate | 1998-06-01 00:00:00.000 |
| 325 | Decal 1 | SellStartDate | 1998-06-01 00:00:00.000 |
| 326 | Decal 2 | SellStartDate | 1998-06-01 00:00:00.000 |
| 327 | Down Tube | SellStartDate | 1998-06-01 00:00:00.000 |
| 328 | Mountain End Caps | SellStartDate | 1998-06-01 00:00:00.000 |
| 329 | Road End Caps | SellStartDate | 1998-06-01 00:00:00.000 |
| 330 | Touring End Caps | SellStartDate | 1998-06-01 00:00:00.000 |
| ... |
Using the UNPIVOT operator, in SQL Server 2005, the equivalent is as follows:
from
(select ProductID, [Name], SellStartDate, SellEndDate
from Production.Product) as rawData
unpivot
(DateValue for DateType in
(SellStartDate, SellEndDate)) as transposed
order by DateValue;
The key to understanding the UNPIVOT syntax is to see how the DateValue and DateType columns are specified. When transposing the columns into rows, they effectively become name / value pairs. The name is the name of the column and value is the value of the column for the relative row. BOL calls it value_column and pivot_column but just think of them as name / value pairs.
The first portion of the query selects the raw data which we will be transposing. The UNPIVOT portion of the query requires the name of the columns to be used for the name and value pair columns (Value FOR Name) followed by the columns that will be transposed. Lastly, the outermost select allows you specify which transposed columns you want to return, so you can choose not to return the Name column if you don't want it.
As the SELECT...CASE statements were able to duplicate all the functionality of the PIVOT operator, SELECT...UNION ALL can perform all the same tricks as UNPIVOT. However, the advantage again is a more compact and expressive syntax.
One behaviour of UNPIVOT to watch out for is that it will not transpose column values that are NULL. NULL values will not appear as name / value pairs in the results. In the sample above I've compensated for this behaviour by including where SellEndDate is not null in the SQL Server 2000 equivalent query. Note that there is way to change this behaviour.
In terms of performance, the query optimiser highlights a significant difference between the SELECT...UNION ALL and UNPIVOT queries. The former requires a pass over the underlying table for each select (2 passes in our sample) whereas the UNPIVOT requires only a single pass. SQL Server is able to transpose the columns into rows for each column in a single pass over the underlying data. This makes the UNPIVOT query less expensive than the equivalent UNION ALL. The size of this penalty is proportional to the number of rows being processed. This fact along makes UNPIVOT a very useful tool to keep note of in your TSQL arsenal.
