Fotia Ltd

Fotia Ltd

Friday, 1 February 2008

Partitioned Views on a Budget

posted by Stefan Delmarco

Partitioned views are a great feature of SQL Server. By injecting an additional layer of abstraction between your logical view and physical implementation they give you fine grained control over data partitioning. However, the cover charge required for entry to this exclusive club is pretty high. The checklist required to implement a partitioned view can often interfere with your design to such a degree that they become more of a hindrance than a benefit. However, by decomposing the features SQL Server uses to implement the complete partitioned views specification we can get most of the benefits without meeting all of the criteria.

One of the benefits of partitioned views is the query optimizer's ability to determine which subset of the member tables need to be queried for each access to the partitioned view. The SQL Server query optimizer has clever features built into it that are able to understand variables and constants in TSQL and relate those constants to search predicates. The optimizer is able to use that insight to short-circuit query plan execution and not bother with branches that will never execute. For example:

declare @constant int;
set @constant = 0;

select *
from Person.Contact
where 1 = @constant;

The query plan for this bit of TSQL looks like this:

The SQL Server query optimizer has introduced a filter into the query plan that has a special property called a Startup Expression Predicate. This property is the short-circuit logic that determines whether or not the clustered index scan on the Person.Contact table executes or not. If the startup expression evaluates to FALSE then the optimizer doesn't bother executing any portion of that query. In this example, SQL Server will do nothing when @constant = 1 evaluates to FALSE. It is the exact same technology that partitioned views use to determine which underlying member tables to access based on the member tables' check constraints / partitioning columns.

For example, consider a situation where I would like to query a table that exists in a number of databases. I would like to have a partitioned view that aggregates these member tables in a single view. However, I don't want to pay the penalty of having to query every member table for every query. If I specify the database name in the SELECT query, I want the optimizer to use its short-circuit feature to go directly to the one table that will satisfy the search arguments. The solution is easier than you think!

create database Database01;
create database Database02;
create database Database10;
create database Database11;
go

use Database01
create table MemberTable (
   Col1 varchar(10));
go

use Database02
create table MemberTable (
   Col1 varchar(10));
go

use Database10
create table MemberTable (
   Col1 varchar(10));
go


use Database11
create table MemberTable (
   Col1 varchar(10));
go

insert Database01.dbo.MemberTable values ('Data01');
insert Database02.dbo.MemberTable values ('Data02');
insert Database10.dbo.MemberTable values ('Data10');
insert Database11.dbo.MemberTable values ('Data11');
go

use tempdb;
go
create view BudgetView as
select 'Database01' as SourceDatabase, Col1
from Database01.dbo.MemberTable
union all
select 'Database02', Col1
from Database02.dbo.MemberTable
union all
select 'Database10', Col1
from Database10.dbo.MemberTable
union all
select
'Database11', Col1
from Database11.dbo.MemberTable
go

select *
from BudgetView;

SourceDatabase Col1
-------------- ----------
Database01 Data01
Database02 Data02
Database10 Data10
Database11 Data11

(4 rows(s) affected)

The query plan for a SELECT of all of the rows shows SQL accessing each member table in turn:

This is the clever bit. We've put the partitioning criteria that separates each member table into the view's definition. We've hardcoded the database name of each member table into the first column of the view. The really clever bit is that SQL Server understands this! Try the following:

select *
from BudgetView
where SourceDatabase = 'Database01';

SourceDatabase Col1
-------------- ----------
Database01 Data01

(1 row(s) affected)

SQL Server has looked at the search argument we specified for SourceDatabase and short-circuited the query plan. It only queried Database01's MemberTable and didn't bother accessing any of the others!

It gets even better, SQL Server understands all manner of comparisons, including LIKEs, in this example it only access databases where the name starts with Database0.

select *
from BudgetView
where SourceDatabase like 'Database0%';

SourceDatabase Col1
-------------- ----------
Database01 Data01
Database02 Data02

(2 row(s) affected)

The story with parameters is also good! The startup filters we saw earlier make an appearance:

declare @sourceDatabase varchar(20);
set @sourceDatabase = 'Database10';

select *
from dbo.BudgetView
where SourceDatabase = @SourceDatabase;
SourceDatabase Col1
-------------- ----------
Database10 Data10

(1 row(s) affected)

At execution time SQL Server will dynamically determine which branches of the query plan to short-circuit. This is great as it allows a single query plan to be re-used for any value of @sourceDatabase and still deliver the short-circuits benefits.

We've managed to create a view that has the same short-circuit benefits as a pukka partitioned view without having to adhere to all the partitioned view requirements. Our base tables don't have primary keys defined and there are no check constraints. On the downside our view is not updateable, it cannot be inserted into, updated or deleted. If you need all of these features you can either implement INSTEAD OF triggers or go with fully-fledged partitioned views. In this example we used the partitioning column to identify the source database name. Constant partitioning columns are also useful for soft-delete / archive tables, coarse categories, or chunky time periods.

Labels: ,

Previous Posts