Fotia Ltd

Fotia Ltd

Friday, 29 May 2009

StringFormat* and VARCHAR(MAX)

posted by Stefan Delmarco

The StringFormat* functions in the Fotia SQLCLR Library demonstrate some very interesting SQLCLR capabilities. In order to make the StringFormat* functions useful, they need to work with all data types being passed as arguments to the format. The natural C# solution is to use the object data type for the argument parameters:

   21 [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
   22 public static SqlString FormatString2(SqlString format, object arg0, object arg1, object arg2)
   23 {

The .NET object maps to sql_variant, making the TSQL function prototype:

    1 create function dbo.FormatString2(@format nvarchar(max), @arg0 sql_variant, @arg1 sql_variant
    2     ,@arg2 sql_variant)
    3 returns nvarchar(max)

However, sql_variant comes with its own limitations. It isn’t compatible will all data types, specifically those that can exceed 8,o16 bytes. The StringFormat* functions therefore do not work with the varchar(max)-family of data types. Given that StingFormat*’s primary purpose is to make string manipulation easier I’ve added the StringFormatMax* family of functions that take all their arguments as SqlString and use nvarchar(max) as their prototype:

   21 [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
   22 public static SqlString FormatStringMax2(SqlString format, SqlString arg0, SqlString arg1,
   23     SqlString arg2)
   24 {

TSQL prototype:

    5 create function dbo.FormatStringMax2(@format nvarchar(max), @arg0 nvarchar(max), @arg1 nvarchar(max)
    6     ,@arg2 nvarchar(max))
    7 returns nvarchar(max)

StringFormatMax* can handle all the varchar(max)-like data types. However, it loses the ability to perform any data type specific formatting (like float precision, padding, etc.) as the arguments are converted to strings by SQL before being passed to the StringFormatMax* functions.

Labels:

Sunday, 24 May 2009

Indispensable SQL CLR functions

posted by Stefan Delmarco

Since the first CTP of SQL Server 2005 I have been building up a library of SQL CLR utilities. Some were just experiments to find the limits of the technology. Others have had tremendous longevity and are part of my daily TSQL repertoire.

The first SQLCLR function I’d like to share deals with a frustration all TSQL programmers have to face on a regular basis. In fact we’re so used to having to work around this obvious TSQL limitation we don’t even notice it anymore. I’m talking about string manipulation.

Everything from simple formatting for reporting to building dynamic SQL (evil but sometimes necessary) highlights how clunky TSQL is when it comes to string manipulation. For example, given a float how difficult is it to output it with a thousands separator to 2 decimal places:

    2 declare @some float;
    3 select @some = rand() * 1E9;
    4 select convert(varchar(20), convert(money, @some), 1);

Building up a trivial dynamic SQL statement can be pretty difficult to debug with all the VB-style concatenation:

    7 declare @database nvarchar(30), @columnFilter nvarchar(30);
    8 select @database = 'AdventureWorks', @columnFilter = 'FirstName';
    9 
   10 declare @sql nvarchar(max);
   11 select @sql = 'select * from ' + @database + '.Person.Contact where ' + @columnFilter + ' like @Value';
   12 exec sp_executesql @sql, N'@Value nvarchar(50)', N'Stef%';

There is a better way. With a very simple CLR function we can bring all the power of C#-style string manipulation inline with TSQL. For example:

   15 select @sql = dbo.FormatString1('select * from {0}.Person.Contact where {1} like @Value', @database, @columnFilter);
   16 exec sp_executesql @sql, N'@Value nvarchar(50)', N'Stef%';

A bunch more examples where TSQL string handling is much improved:

   19 -- Date formatting is easy...
   20 select dbo.FormatString0('{0:dd MMM yyyy}', getdate());
   21 
   22 -- Can create templates and keep reusing them. Format and args
   23 -- can be separated (not possible with TSQL).
   24 declare @template varchar(500);
   25 set @template = 'Order {0} shipped at {1:dd MMM yyyy} for Account {2}, Total Due {3:N2}';
   26 
   27 select dbo.FormatString3(@template, SalesOrderID, ShipDate, AccountNumber, TotalDue)
   28 from Sales.SalesOrderHeader
   29 where SalesOrderId = 44314;
   30 
   31 -- Custom formatting is simple, try do this as succinctly in TSQL:
   32 select dbo.FormatString0('{0:0.00%;(0.00)%}',-0.1567);
   33 
   34 -- Padding is all in the format string....
   35 select dbo.FormatString0('{0,20}', LastName)
   36 from Person.Contact
   37 where FirstName like 'ste%';
   38 
   39 -- The thousands separator example from earlier...
   40 declare @some float;
   41 select @some = rand() * 1E9;
   42 select dbo.FormatString0('{0:N2}', @some);

The source code is pretty trivial:

   15 [SqlFunction(DataAccess = DataAccessKind.None)]
   16 public static SqlString FormatString1(SqlString format, object arg0, object arg1)
   17 {
   18     return format.IsNull ? SqlString.Null : string.Format(format.Value, SqlTypeToNetType(arg0, arg1));
   19 }

Some points to note about the source code:

  • The SqlTypes don’t have any overloads in their ToString implementations which makes custom formatting impossible. To work around this limitation I convert the SqlType to a .NET underlying type.  This is the purpose of the SqlTypeToNetType function. It extracts the .Value property from each of the SqlTypes.
  • Unlike stored procedures, UDF parameters are not optional. Instead of having a single StringFormat function that takes the most number of args I’ll ever need (which will be NULL in most places) I have 4 UDFs: StringFormat0/1/2/3.
  • I’ve created a project on CodePlex called Fotia SQLCLR Library. The full source code is available as well as a SQL script that will deploy the entire library (binaries not required). In the coming weeks I’ll be adding many more very useful SQL CLR functions and user-defined aggregates.

Labels:

Friday, 1 February 2008

Partitioned Views on a Budget

posted by Stefan Delmarco

Partitioned views are a great feature of SQL Server. By injecting an additional layer of abstraction between your logical view and physical implementation they give you fine grained control over data partitioning. However, the cover charge required for entry to this exclusive club is pretty high. The checklist required to implement a partitioned view can often interfere with your design to such a degree that they become more of a hindrance than a benefit. However, by decomposing the features SQL Server uses to implement the complete partitioned views specification we can get most of the benefits without meeting all of the criteria.

One of the benefits of partitioned views is the query optimizer's ability to determine which subset of the member tables need to be queried for each access to the partitioned view. The SQL Server query optimizer has clever features built into it that are able to understand variables and constants in TSQL and relate those constants to search predicates. The optimizer is able to use that insight to short-circuit query plan execution and not bother with branches that will never execute. For example:

declare @constant int;
set @constant = 0;

select *
from Person.Contact
where 1 = @constant;

The query plan for this bit of TSQL looks like this:

The SQL Server query optimizer has introduced a filter into the query plan that has a special property called a Startup Expression Predicate. This property is the short-circuit logic that determines whether or not the clustered index scan on the Person.Contact table executes or not. If the startup expression evaluates to FALSE then the optimizer doesn't bother executing any portion of that query. In this example, SQL Server will do nothing when @constant = 1 evaluates to FALSE. It is the exact same technology that partitioned views use to determine which underlying member tables to access based on the member tables' check constraints / partitioning columns.

For example, consider a situation where I would like to query a table that exists in a number of databases. I would like to have a partitioned view that aggregates these member tables in a single view. However, I don't want to pay the penalty of having to query every member table for every query. If I specify the database name in the SELECT query, I want the optimizer to use its short-circuit feature to go directly to the one table that will satisfy the search arguments. The solution is easier than you think!

create database Database01;
create database Database02;
create database Database10;
create database Database11;
go

use Database01
create table MemberTable (
   Col1 varchar(10));
go

use Database02
create table MemberTable (
   Col1 varchar(10));
go

use Database10
create table MemberTable (
   Col1 varchar(10));
go


use Database11
create table MemberTable (
   Col1 varchar(10));
go

insert Database01.dbo.MemberTable values ('Data01');
insert Database02.dbo.MemberTable values ('Data02');
insert Database10.dbo.MemberTable values ('Data10');
insert Database11.dbo.MemberTable values ('Data11');
go

use tempdb;
go
create view BudgetView as
select 'Database01' as SourceDatabase, Col1
from Database01.dbo.MemberTable
union all
select 'Database02', Col1
from Database02.dbo.MemberTable
union all
select 'Database10', Col1
from Database10.dbo.MemberTable
union all
select
'Database11', Col1
from Database11.dbo.MemberTable
go

select *
from BudgetView;

SourceDatabase Col1
-------------- ----------
Database01 Data01
Database02 Data02
Database10 Data10
Database11 Data11

(4 rows(s) affected)

The query plan for a SELECT of all of the rows shows SQL accessing each member table in turn:

This is the clever bit. We've put the partitioning criteria that separates each member table into the view's definition. We've hardcoded the database name of each member table into the first column of the view. The really clever bit is that SQL Server understands this! Try the following:

select *
from BudgetView
where SourceDatabase = 'Database01';

SourceDatabase Col1
-------------- ----------
Database01 Data01

(1 row(s) affected)

SQL Server has looked at the search argument we specified for SourceDatabase and short-circuited the query plan. It only queried Database01's MemberTable and didn't bother accessing any of the others!

It gets even better, SQL Server understands all manner of comparisons, including LIKEs, in this example it only access databases where the name starts with Database0.

select *
from BudgetView
where SourceDatabase like 'Database0%';

SourceDatabase Col1
-------------- ----------
Database01 Data01
Database02 Data02

(2 row(s) affected)

The story with parameters is also good! The startup filters we saw earlier make an appearance:

declare @sourceDatabase varchar(20);
set @sourceDatabase = 'Database10';

select *
from dbo.BudgetView
where SourceDatabase = @SourceDatabase;
SourceDatabase Col1
-------------- ----------
Database10 Data10

(1 row(s) affected)

At execution time SQL Server will dynamically determine which branches of the query plan to short-circuit. This is great as it allows a single query plan to be re-used for any value of @sourceDatabase and still deliver the short-circuits benefits.

We've managed to create a view that has the same short-circuit benefits as a pukka partitioned view without having to adhere to all the partitioned view requirements. Our base tables don't have primary keys defined and there are no check constraints. On the downside our view is not updateable, it cannot be inserted into, updated or deleted. If you need all of these features you can either implement INSTEAD OF triggers or go with fully-fledged partitioned views. In this example we used the partitioning column to identify the source database name. Constant partitioning columns are also useful for soft-delete / archive tables, coarse categories, or chunky time periods.

Labels: ,


Previous Posts