Working with Partitioned Views
Now that we have our partitioned view created, let's add some data:
-- Create some data for the view
-- Note that here are we inserting into the *view* and
-- not the underlying tables.
declare @i
int;
set @i = -100;
while @i < 100
begin
insert dbo.PartView
(partition, someData, filler)
values
(@i, rand(),
replicate('*',
100));
set @i = @i + 1;
end
go
Here we have a loop that will insert 200 rows into the partitioned view. Note that
nowhere in the script are we specifying any information about the base tables. We
are just treating the view as a table. Let's have a look at the net effect:
select count(*)
as NumberOfRows,
min(partition)
as MinPartition,
max(partition)
as
MaxPartition
from dbo.TablePos
;
go
|
NumberOfRows |
MinPartition |
MaxPartition |
|
------------ |
------------ |
------------ |
|
100 |
0 |
99 |
select count(*)
as NumberOfRows,
min(Partition)
as MinPartition,
max(Partition)
as
MaxPartition
from dbo.TableNeg
;
go
|
NumberOfRows |
MinPartition |
MaxPartition |
|
------------ |
------------ |
------------ |
|
100 |
-100 |
-1 |
As you can see, all of the rows with a Partition value of 0 to 99 are in TablePos,
-100 to -1 are in TableNeg. To see how the magic is happening have a look at the
execution plan of a single insert:
The highlighted Compute Scalar step is evaluating the check constraint on
the base tables. SQL Server has read and understood the partitioning of the underlying
tables. The Ptn1018 expression will be 0 if the insert is destined for TablePos,
1 if the target is TableNeg and NULL if it neither (the following Assert
step will raise an error if the value for Partition is outside the allowable constraints).
This evaluation is then used to control which steps of the remaining execution plan
are executed. It is easy to be mislead into thinking that because there are two
Clustered Index Insert statements, SQL is touching both base tables. However,
have a closer look at the Number of executes for each insert. It is 1
for the insert into TablePos and 0 for TableNeg. Essentially
SQL Server has ‘short circuited' the execution plan at runtime. It has only
executed the insert into the base table whose check constraints will accept the
insert.
This has a number of advantages. Firstly it means that SQL can re-use the same execution
plan irrespective of the data being inserted. It will dynamically adjust the plan
at runtime to the actual data being acted on. This is especially good for distributed
partitioned views as it means SQL will only reach out to the linked server hosting
the required data. It will not waste costly network roundtrips to the SQL Servers
that it knows do not fall within the check constraints defined in their hosted base
tables. Therefore the same execution plan can be cached and re-used, knowing that
the optimizer will only execute the portions relevant to each execution context.
Let's see what a select from the partitioned view looks like:
-- Select from partitioned view.
declare @partition
int
set @partition = 11;
select partition, someData
from dbo.PartView
where partition = @partition;
go
This SELECT statement produces the following execution plan:
I used a parameter instead of a hard-coded value for the WHERE clause as I wanted
to demonstrate what happens in stored procedures. Using a hard-coded value will
cause SQL to create a plan that only access the base table, totally short-circuiting
the partitioned view. However, we rarely use hard-coding in real-world applications.
We create stored procedures that use parameters. Using a parameter will cause SQL
to generate a plan that can be re-used irrespective of the value used for the Partition
value in the WHERE clause.
As you can see, SQL has used the same strategy we saw in the insert. At first glance
it looks like SQL is accessing both base tables. However, looking at the Number
of executes shows that SQL has ‘short-circuited' the execution plan
and only accessed TablePos. Even though the Clustered Index Seek on TableNeg
is present in the execution plan, it is not actually touched. Should @Partition
ever be a negative value, TableNeg would be accessed and TablePos
would be skipped. In this manner SQL can have a single execution plan that will
be efficient and fast for any value of @Partition, even more so if we were
using linked servers (i.e. no unnecessary network roundtrips).
The 'short-circuit' step is known as a Startup Filter. This is the clever
bit that determines which portions of the plan are executed and which are skipped.
It can be a misleading to see this step execute after the Clustered Index Seek when,
in fact, it is controlling which Clustered Index Seek executes and which is skipped.
The Startup Filter's expression determines whether or not the preceding clustered
index seeks executes or not. For the TablePos branch the startup expression
is @Partition >= 0, and for the TableNeg branch the startup expression is @Partition
< 0.
The SELECT statements get even more interesting when using aggregations:
-- Select from partitioned view with aggregation.
select avg(SomeData)
as AvgSomeData
from dbo.PartView;
go
This SELECT statement produces the following execution plan:
This looks fairly complex but once you decompose it you'll see how clever it is.
Firstly, SQL is reading all the rows in each underlying base table. The Stream Aggregate
step after the Clustered Index Scan is calculating the sum (note it is
not the average we asked for...) per partitioned table. Notice that
SQL is not reading all of the rows from all of the tables and then calculating the
aggregation. It is doing an aggregation per partitioned table. Next, note the Argument
for the Stream Aggregate steps. You'll see that SQL is determining:
- the number of rows in each base table, COUNT(*), and
- the sum,
SUM([...].[SomeData]), of the column whose average we are calculating.
In the Concatenation step we therefore only have 2 rows. One row per base
table, each containing the number of rows and the sum of SomeData. It is only then
that SQL actually calculates the final average by summing these 2 rows and calculating
the average. So why is this clever? Firstly, if this was a distributed partitioned
view we wouldn't be saturating the network by having each linked server return all
of the rows in its base table. Instead each linked server would only be returning
the number of rows it contains and the sum of these rows. This is a huge performance
advantage. Secondly, SQL is offloading the calculating of a portion of the aggregation
to each of the partitions. SQL then only needs to perform a rollup of the portioned
aggregations to calculate the final result!
Lastly, let's see what happens when we update the partitioning column:
-- Update partitioning column.
-- Insert a well-known value for Filler.
insert dbo.PartView
(partition, someData, filler)
values
(101,
rand(),
'WellKnownFiller')
;
go
-- Check which partitioned table received the row:
select *
from dbo.TablePos
where filler =
'WellKnownFiller';
go
|
Partition |
SomeData |
Filler |
|
----------- |
---------------------- |
---------------- |
|
101 |
4.7535715989021003E-2 |
WellKnownFiller |
select *
from dbo.TableNeg
where filler =
'WellKnownFiller';
go
|
Partition |
SomeData |
Filler |
|
----------- |
---------------------- |
---------------- |
Nothing new here. The row we inserted exists in the TablePos table as the
value for Partition we used was >= 0. So now let's try update the partitioning
column:
-- Row is in TablePos. Updating the partitioning column
that will violate
-- the check contraint in TablePos. This will throw a CHECK CONSTRAINT
-- error as we are accessing the base table directly.
update dbo.TablePos
set partition = -101
where filler =
'WellKnownFiller';
go
Server: Msg 547, Level 16, State 1, Line 1
UPDATE statement conflicted with COLUMN CHECK constraint 'CK__TablePos__Partit__117F9D94'.
The conflict occurred in database 'tempdb', table 'TablePos', column 'Partition'.
The statement has been terminated.
|
This fails simply because we have violated the check constraint on the Partition
column on TablePos, as expected. Now let's go through the partitioned view.
-- However, if we go through the partitioned view, this
works!
update dbo.PartView
set partition = -101
where filler = 'WellKnownFiller';
go
This works! Have a look at the execution plan. It is fairly lengthy so I've extracted
out the interesting bits. The SWITCH operation at the beginning of the query plan
converts the UPDATE to a DELETE / INSERT pair. That is what's responsible for the
movement of the updated row between underlying tables. The INSERT at the end of
execution plan has also been highlighted.
<<< Previous
Next >>>