Fotia Ltd

Fotia Ltd

Thursday, 6 July 2006

The OUTPUT Clause

posted by Stefan Delmarco

If any new feature of TSQL will be a big hit it will be the OUTPUT clause. This is one new addition to TSQL that you'll use everywhere! How many times have you written this code?

-- A typical table...
create table someTable (
    Id int identity(1,1) primary key clustered,
    SomeData varchar(100) not null);

declare @id int;
-- Insert the data.
insert someTable (someData)
values ('DataOne');

-- Figure out what identity the row was given.
set @id = scope_identity();
-- Return it...
select @id as Id;

Isn't it tedious that you have to read the data that you just inserted as a second operation following the insert? The whole reason scope_identity exists is to protect you from any context that might be introduced between these two operations that you may not be aware of, like triggers (aarrgh!). After all, why can't you just write the following...

declare @id int;

-- Insert the data, better...
insert someTable (someData)
output inserted.Id
values ('DataTwo');

This is exactly how the OUTPUT clause works. It allows data that is inserted / updated / deleted to be returned. Anyone that's done any work with triggers would have immediately recognised the 'virtual' inserted table. In fact, drawing parallels between triggers and the OUTPUT clause is the best way to understand what's going on here. The OUTPUT clause allows you to access the virtual trigger tables inline with your INSERT / UPDATE / DELETE statements. You essentially get 'on-the-fly' trigger access. For INSERT you have the inserted table, for update you have both the inserted and deleted tables, and for DELETE just the deleted table. Simple, isn't it? In the above example we use the OUTPUT clause to get to the identity column we inserted. It could have been any column in the table. Think how useful this would be for retrieving timestamp values or computed columns.

Have a look at these examples to see what some of the practical applications are:

(1) What did I just delete?

delete someTable
output deleted.*
where id = 1;

This query returns all of the data that was just deleted. Great for audit records! It also prevents having to take out locks on the rows you're going to delete so that the values read are the ones deleted. No encompassing transaction required here.

(2) Before and after values

update someTable
set someData = 'DataThree'
output deleted.someData as Before,
    inserted.someData as After,
    difference(deleted.someData, inserted.someData) as Delta
where id = 2;
Before After Delta
--------- ---------- ------
DataTwo DataThree 3

This query will return the values for someData as it was before the update and it is after the update in a single row. In addition, note that the OUTPUT clause is defined as a meaning that any expressions (i.e. functions / calculations, etc.) you could do in a SELECT you can do in an OUTPUT.

(3) Deleting multiple rows

-- Declare a table variable.
declare @someTableTemp table (
    Id int not null,
    someData varchar(100) not null);

-- Capture the output into a table variable.
delete someTable
output, deleted.someData
into @someTableTemp
    (Id, someData);

-- What was deleted...
select *
from @someTableTemp;

The output can also be captured into a table variable. It doesn't have to be a table variable. It could be any permanent table. In this case think of the OUTPUT ... INTO as an INSERT ... SELECT. Just for the hell of it you can write crazy TSQL like this:

-- The delete that never was!
delete someTable
output deleted.someData
into someTable (someData);

Lastly, a couple of points to remember:

  • The values in the output virtual tables will be populated after any triggers for the table have fired, so remember that you are at the back of the queue.
  • Inserting / updating / deleting data and then re-reading the rows affected is always dangerous because it makes you susceptible to deadlocks. Using the OUTPUT clause will avoid this as the action and read occur as a single unit. In SQL Server 2000 there was a trick with the UPDATE statement that allowed you to avoid re-reading updates if you're interested in the value of a specific column that was updated (this only works if the UPDATE affects a single row). Consider the following table:

create table ReserveRange (
    IdName varchar(100) not null primary key clustered,
    Id int not null);

insert ReserveRange (IdName, Id)
values ('Id#1', 0);

You want to use this table to reserve ranges of Id's. You need to:

  • read the current Id,
  • increment it by a certain amount,
  • return the range that has been reserved in a thread-safe manner.

Normally developers wrap the TSQL in a transaction and use a UPDLOCK locking hint to serialize access to the table like this (a couple of variations of this are possible but all require the transaction and a long-running update lock):

begin tran;

declare @range int;
set @range = 100;
declare @id int;

-- Get the current value.
select @id = Id;
from ReserveRange with(updlock)
where IdName = 'Id#1';

-- Reserve a range.
update ReserveRange
set id = @id + @range
where IdName = 'Id#1';

-- Return the range
select @id + 1 as LowRange,
    @id + @range as HighRange;

commit tran;

The trick is to notice that the UPDATE statement supported triple assignments. Have a look at the following:

declare @range int;
set @range = 100;

declare @id int;

-- Update the current range and assign it to
-- a local variable
update ReserveRange
set @id = id = id + @range
where IdName = 'Id#1';

-- Return the range
select @id - @range + 1 as LowRange,
    @id as HighRange;

Note, no encompassing transaction and no long-running locks. Next we'll look at VARCHAR(MAX).

Labels: ,

Previous Posts