Fotia Ltd

Fotia Ltd

Tuesday, 9 May 2006

Snapshot Isolation Part I

posted by Stefan Delmarco

Our investigation into snapshot isolation will be split into a number of parts. We'll firstly introduce the basic concept of what snapshot isolation means and how it can be used in its most basic form. Then we'll have a look at the advanced use of snapshot isolation as a full-blown transaction isolation level.

At its core, the SQL Server engine shares many similarities with a source code version control system. In fact, many of the terms used in snapshot isolation level have direct counterparts with applications like Rational's ClearCase version control system. In ClearCase, every file in the repository has an associated version number. When developers wants to make changes to a file, they first need to 'check it out'. This prevents any other developer from being able to make any changes to that file while it is being modified. In reserved mode no other developer can check the file out till the file has been checked back in. This is the same behaviour as SQL Server 2000 and SQL Server 2005 without snapshot isolation. When a row is being inserted / updated / deleted, the row is essentially 'checked out' till the transaction that performed the 'check out' either commits or rolls back. No other connection can even look at the row till it is 'checked back in'. To change this behaviour TSQL allows you specify locking hints such as NOLOCK (look at the row anyway, even though the data may only be partly changed - be very wary of using this) or READPAST (pretend that the row isn't even there).

Back to the ClearCase analogy. Even though a file has been checked out, developers can still look at the previous version of a file. For example, if HelloWorld.cs was checked out at version 3, developers can still look at versions 1 through 2. Version 3 only becomes visible when the developer has checked HelloWorld.cs back in. Well, this is precisely what snapshot isolation does. SQL Server maintains a version history of all rows that are in the process of being created / changed / removed. Does this mean that you can have a look at all previous versions of a row since it was created? No. SQL Server only maintains the last version of a row while it is being modified. Where are these previous versions stored? Is my database size going to multiply? No, previous version of a row are transient and are stored in TempDb.

All that snapshot isolation is trying to do is to prevent readers from blocking writers. That is, if a reader is trying to SELECT a row that is currently being altered by another writer, SQL Server will return the last version of the row that was committed instead of forcing the reader to wait till the writer commits. No more blocking...most of the time anyway(read on!).

As mentioned earlier, there are two modes of operation with snapshot isolation. We'll be looking the most basic mode of operation, known as READ_COMMITTED_SNAPSHOT. All that we need to know at this stage is that this behaviour applies when the transaction is running under the READ COMMITTED transaction isolation level (as opposed to READ UNCOMMITTED, REPEATABLE READ or SERIALIZABLE). This mode of behaviour is best observed with an example.

create database DiscoverYukon;
go
use DiscoverYukon;
go

-- Enable snapshot isolation for read committed.
alter database DiscoverYukon
set read_committed_snapshot on;
go

-- Create a test table.
create table someTable (
    Id int not null primary key clustered,
    SomeData varchar(255));
go

READ_COMMITTED_SNAPSHOT needs to be enabled at the database level. The default in the Model database (the template used whenever creating a new database) has READ_COMMITTED_SNAPSHOT set to OFF.

We'll dig into snapshot isolation using two connections. On the first connection, execute the following TSQL to insert a new record into the test table. We wrap this in a transaction so that we can interact with it from another connection:

-- Connection #1...
begin transaction
insert someTable (Id, Somedata)
values (1, 'FirstVersion');

On the other connection we simulate a user trying to read the same row:

-- Connection #2...
select * from someTable;

This query immediately comes back with no records! In SQL Server 2000 the SELECT would have blocked till connection #1 committed (try it out!). But, what if connection #2 tries to insert a record with the same Id. What if we added some 'insert Id 1 if it does not exist'-type logic?

-- Connection #2...
if not exists (
    select top 1 1
    from someTable
    where id = 1) begin

    insert someTable (Id, Somedata)
    values (1, 'FirstVersion');
end

When running this query you'll notice that it has reverted back to the SQL Server 2000 behaviour. It blocks till connection #1 commits. However, when connection #1 commits something that may be unexpected happens. Connection #2's query fails with the following error:

Msg 2627, Level 14, State 1, Line 6
Violation of PRIMARY KEY constraint 'PK__someTable__7C8480AE'. Cannot insert duplicate key in object 'dbo.someTable'.
The statement has been terminated.

Connection #2 was not blocking on the SELECT statement! That completed successfully. It was the INSERT that was blocking. SQL Server was not allowing a previous version of the row to be changed (in this case a row with the same Id to be created) even though snapshot isolation was enabled. What you read is not always what you change when snapshot isolation is in use. Therefore, care must be taken when trying to write this type of query. In this instance, the SELECT needs to use the new locking hint available in SQL Server 2005, namely READCOMMITTEDLOCK. This is not very intuitive as the existing lock hint is READCOMMITTED. The difference between these two is that READCOMMITTED will still honour the READ_COMMITTED_SNAPSHOT setting while READCOMMITTEDLOCK will not. READCOMMITTEDLOCK will not read previous versions of the row and will wait till it obtains a read lock on the actual row (it is oblivious to previous versions). Therefore the TSQL needs to be rewritten as:

-- Connection #2...
if not exists (
    select top 1 1
    from someTable with(readcommittedlock)
    where id = 1) begin

    insert someTable (Id, Somedata)
    values (1, 'FirstVersion');
end

In this case connection #2 will block on the SELECT till connection #1 commits. When it does, connection #2 completes normally and does not attempt to insert a duplicate row (the IF returns FALSE). Next we'll look at how UPDATEs are affected by snapshot isolation.

Labels: ,

Previous Posts