Fotia Ltd

Fotia Ltd

Friday, 13 October 2006

The UNPIVOT Operator

posted by Stefan Delmarco

Completing the discussion on the PIVOT operator is its antonym, the UNPIVOT. However, I would argue that the UNPIVOT operator has been misnamed by Microsoft as it creates the mistaken impression that PIVOT -> UNPIVOT -> PIVOT gets you back to where you started. From BOL:

UNPIVOT performs the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.

It fails to highlight that fact that PIVOT has aggregated the raw data in order to rotate the underlying rows into columns - you have to read to the very end of the Using PIVOT and UNPIVOT sample for the fine print in BOL. In contrast, the UNPIVOT operator does not perform any aggregation whatsoever. It just turns columns into rows. For each column in the table being UNPIVOT'd you get a row in the result set where the row values are ColumnName and ColumnValue. It therefore helps to think of transpose instead of UNPIVOT as this is more accurate description of the UNPIVOT operator's behaviour.

Nevertheless, let's have a look at the syntax to try and get a better feel for what UNPIVOT brings to the table. As a worked example, consider the following requirement: Write a query to produce a result set that contains a list of products (Production.Product) and their SellStartDate and SellEndDate. The SellStartDate and SellEndDate need to be listed in a single column with a second column specifying whether it is the SellStartDate or SellEndDate value being specified. Lastly, the result set needs to be sorted by this consolidated date column to produce a linear timeline of 'product events'.

In SQL Server 2000, you'd probably end up with a query that looks very much like the following:

select ProductId, [Name], 'SellStartDate' as DateType, SellStartDate as DateValue
from Production.Product
union all
select ProductId, [Name], 'SellEndDate' as DateType, SellEndDate as DateValue
from Production.Product
where SellEndDate is not null
order by DateValue;

ProductId Name DateType DateValue
----------- ----------- ----------- -----------
320 Chainring Bolts SellStartDate 1998-06-01 00:00:00.000
321 Chainring Nut SellStartDate 1998-06-01 00:00:00.000
322 Chainring SellStartDate 1998-06-01 00:00:00.000
323 Crown Race SellStartDate 1998-06-01 00:00:00.000
324 Chain Stays SellStartDate 1998-06-01 00:00:00.000
325 Decal 1 SellStartDate 1998-06-01 00:00:00.000
326 Decal 2 SellStartDate 1998-06-01 00:00:00.000
327 Down Tube SellStartDate 1998-06-01 00:00:00.000
328 Mountain End Caps SellStartDate 1998-06-01 00:00:00.000
329 Road End Caps SellStartDate 1998-06-01 00:00:00.000
330 Touring End Caps SellStartDate 1998-06-01 00:00:00.000

Using the UNPIVOT operator, in SQL Server 2005, the equivalent is as follows:

select ProductId, [Name], DateType, DateValue
    (select ProductID, [Name], SellStartDate, SellEndDate
    from Production.Product) as rawData
    (DateValue for DateType in
       (SellStartDate, SellEndDate)) as transposed
order by DateValue;

The key to understanding the UNPIVOT syntax is to see how the DateValue and DateType columns are specified. When transposing the columns into rows, they effectively become name / value pairs. The name is the name of the column and value is the value of the column for the relative row. BOL calls it value_column and pivot_column but just think of them as name / value pairs.

The first portion of the query selects the raw data which we will be transposing. The UNPIVOT portion of the query requires the name of the columns to be used for the name and value pair columns (Value FOR Name) followed by the columns that will be transposed. Lastly, the outermost select allows you specify which transposed columns you want to return, so you can choose not to return the Name column if you don't want it.

As the SELECT...CASE statements were able to duplicate all the functionality of the PIVOT operator, SELECT...UNION ALL can perform all the same tricks as UNPIVOT. However, the advantage again is a more compact and expressive syntax.

One behaviour of UNPIVOT to watch out for is that it will not transpose column values that are NULL. NULL values will not appear as name / value pairs in the results. In the sample above I've compensated for this behaviour by including where SellEndDate is not null in the SQL Server 2000 equivalent query. Note that there is way to change this behaviour.

In terms of performance, the query optimiser highlights a significant difference between the SELECT...UNION ALL and UNPIVOT queries. The former requires a pass over the underlying table for each select (2 passes in our sample) whereas the UNPIVOT requires only a single pass. SQL Server is able to transpose the columns into rows for each column in a single pass over the underlying data. This makes the UNPIVOT query less expensive than the equivalent UNION ALL. The size of this penalty is proportional to the number of rows being processed. This fact along makes UNPIVOT a very useful tool to keep note of in your TSQL arsenal.

Labels: ,

Previous Posts