Fotia Ltd

Fotia Ltd

Working with Partitioned Views

Now that we have our partitioned view created, let's add some data:

-- Create some data for the view
-- Note that here are we inserting into the *view* and
-- not the underlying tables.
declare @i int;
set @i = -100;
while @i < 100 begin
    insert dbo.PartView
        (partition, someData, filler)
    values
        (@i, rand(), replicate('*', 100));

    set @i = @i + 1;
end
go

Here we have a loop that will insert 200 rows into the partitioned view. Note that nowhere in the script are we specifying any information about the base tables. We are just treating the view as a table. Let's have a look at the net effect:

select count(*) as NumberOfRows, min(partition) as MinPartition, max(partition) as MaxPartition
from dbo.TablePos;
go
NumberOfRows MinPartition MaxPartition
------------ ------------ ------------
100 0 99

select count(*) as NumberOfRows, min(Partition) as MinPartition, max(Partition) as MaxPartition
from dbo.TableNeg;
go
NumberOfRows MinPartition MaxPartition
------------ ------------ ------------
100 -100 -1

As you can see, all of the rows with a Partition value of 0 to 99 are in TablePos, -100 to -1 are in TableNeg. To see how the magic is happening have a look at the execution plan of a single insert:

Partitioned View Insert

The highlighted Compute Scalar step is evaluating the check constraint on the base tables. SQL Server has read and understood the partitioning of the underlying tables. The Ptn1018 expression will be 0 if the insert is destined for TablePos, 1 if the target is TableNeg and NULL if it neither (the following Assert step will raise an error if the value for Partition is outside the allowable constraints).

This evaluation is then used to control which steps of the remaining execution plan are executed. It is easy to be mislead into thinking that because there are two Clustered Index Insert statements, SQL is touching both base tables. However, have a closer look at the Number of executes for each insert. It is 1 for the insert into TablePos and 0 for TableNeg. Essentially SQL Server has ‘short circuited' the execution plan at runtime. It has only executed the insert into the base table whose check constraints will accept the insert.

This has a number of advantages. Firstly it means that SQL can re-use the same execution plan irrespective of the data being inserted. It will dynamically adjust the plan at runtime to the actual data being acted on. This is especially good for distributed partitioned views as it means SQL will only reach out to the linked server hosting the required data. It will not waste costly network roundtrips to the SQL Servers that it knows do not fall within the check constraints defined in their hosted base tables. Therefore the same execution plan can be cached and re-used, knowing that the optimizer will only execute the portions relevant to each execution context.

Let's see what a select from the partitioned view looks like:

-- Select from partitioned view.
declare @partition int
set @partition = 11;

select partition, someData
from dbo.PartView
where partition = @partition;
go

This SELECT statement produces the following execution plan:

Partitioned View Select

I used a parameter instead of a hard-coded value for the WHERE clause as I wanted to demonstrate what happens in stored procedures. Using a hard-coded value will cause SQL to create a plan that only access the base table, totally short-circuiting the partitioned view. However, we rarely use hard-coding in real-world applications. We create stored procedures that use parameters. Using a parameter will cause SQL to generate a plan that can be re-used irrespective of the value used for the Partition value in the WHERE clause.

As you can see, SQL has used the same strategy we saw in the insert. At first glance it looks like SQL is accessing both base tables. However, looking at the Number of executes shows that SQL has ‘short-circuited' the execution plan and only accessed TablePos. Even though the Clustered Index Seek on TableNeg is present in the execution plan, it is not actually touched. Should @Partition ever be a negative value, TableNeg would be accessed and TablePos would be skipped. In this manner SQL can have a single execution plan that will be efficient and fast for any value of @Partition, even more so if we were using linked servers (i.e. no unnecessary network roundtrips).

The 'short-circuit' step is known as a Startup Filter. This is the clever bit that determines which portions of the plan are executed and which are skipped. It can be a misleading to see this step execute after the Clustered Index Seek when, in fact, it is controlling which Clustered Index Seek executes and which is skipped. The Startup Filter's expression determines whether or not the preceding clustered index seeks executes or not. For the TablePos branch the startup expression is @Partition >= 0, and for the TableNeg branch the startup expression is @Partition < 0.

The SELECT statements get even more interesting when using aggregations:

-- Select from partitioned view with aggregation.
select avg(SomeData) as AvgSomeData
from dbo.PartView;
go

This SELECT statement produces the following execution plan:

Partitioned View SELECT Aggregate

This looks fairly complex but once you decompose it you'll see how clever it is. Firstly, SQL is reading all the rows in each underlying base table. The Stream Aggregate step after the Clustered Index Scan is calculating the sum (note it is not the average we asked for...) per partitioned table. Notice that SQL is not reading all of the rows from all of the tables and then calculating the aggregation. It is doing an aggregation per partitioned table. Next, note the Argument for the Stream Aggregate steps. You'll see that SQL is determining:

  • the number of rows in each base table, COUNT(*), and
  • the sum, SUM([...].[SomeData]), of the column whose average we are calculating.

In the Concatenation step we therefore only have 2 rows. One row per base table, each containing the number of rows and the sum of SomeData. It is only then that SQL actually calculates the final average by summing these 2 rows and calculating the average. So why is this clever? Firstly, if this was a distributed partitioned view we wouldn't be saturating the network by having each linked server return all of the rows in its base table. Instead each linked server would only be returning the number of rows it contains and the sum of these rows. This is a huge performance advantage. Secondly, SQL is offloading the calculating of a portion of the aggregation to each of the partitions. SQL then only needs to perform a rollup of the portioned aggregations to calculate the final result!

Lastly, let's see what happens when we update the partitioning column:

-- Update partitioning column.
-- Insert a well-known value for Filler.

insert dbo.PartView
    (partition, someData, filler)
values
    (101, rand(), 'WellKnownFiller');
go

-- Check which partitioned table received the row:
select *
from dbo.TablePos
where filler = 'WellKnownFiller';
go
Partition SomeData Filler
----------- ---------------------- ----------------
101 4.7535715989021003E-2 WellKnownFiller

select *
from dbo.TableNeg
where filler = 'WellKnownFiller';
go
Partition SomeData Filler
----------- ---------------------- ----------------

Nothing new here. The row we inserted exists in the TablePos table as the value for Partition we used was >= 0. So now let's try update the partitioning column:

-- Row is in TablePos. Updating the partitioning column that will violate
-- the check contraint in TablePos. This will throw a CHECK CONSTRAINT
-- error as we are accessing the base table directly.
update dbo.TablePos
set partition = -101
where filler = 'WellKnownFiller';
go
Server: Msg 547, Level 16, State 1, Line 1
UPDATE statement conflicted with COLUMN CHECK constraint 'CK__TablePos__Partit__117F9D94'. The conflict occurred in database 'tempdb', table 'TablePos', column 'Partition'.
The statement has been terminated.

This fails simply because we have violated the check constraint on the Partition column on TablePos, as expected. Now let's go through the partitioned view.

-- However, if we go through the partitioned view, this works!
update dbo.PartView
set partition = -101
where filler = 'WellKnownFiller';
go

This works! Have a look at the execution plan. It is fairly lengthy so I've extracted out the interesting bits. The SWITCH operation at the beginning of the query plan converts the UPDATE to a DELETE / INSERT pair. That is what's responsible for the movement of the updated row between underlying tables. The INSERT at the end of execution plan has also been highlighted.

Partitioned View Update

<<< Previous   Next >>>