Fotia Ltd

Fotia Ltd

Monday, 24 July 2006

VARCHAR(MAX)

posted by Stefan Delmarco

VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) are the SQL Server 2005 replacements for TEXT, NTEXT and IMAGE large. Not only are the replacements far better named than their SQL Server 2000 counterparts, they are also far easier to use. However, before we start rejoicing in their virtues, a word of caution. Just because you can now assign a 2GB string to a local variable doesn't mean you should! The ease with which large-value data types can be accessed and manipulated means that we have a greater need to treat the data responsibly and be aware that we could be handling a large amount of data in a local variable. It is now more important to be aware that you could be moving mountains of data in some very innocent looking TSQL. We must make sure we don't use these powerful capabilities inappropriately. "With great power comes great responsibility" - Spiderman!

In BOL you'll notice that a distinction is drawn between large value types and large object data types. The new MAX types are part of the large value types and the TEXT / NTEXT / IMAGE family are part of the large object types. The only distinction I can find between the two that warrants this separation is the use of the infamous text pointer that was used with large object types. So when you come across large value versus large object types in BOL, this is what they mean.

TEXT, NTEXT and IMAGE are notoriously difficult to use within the database (i.e. within TSQL or stored procedures). It is not possible to use TEXT, NTEXT and IMAGE data types as local variables. SQL Server allows us to use a level of indirection through a 'text pointer'. Essentially this is a handle to a TEXT / NTEXT / IMAGE column of a specific row. In order to access the data in the column we first have to:

  • read the text pointer of the column of the row we were interested in (using the TEXTPTR function), then
  • use the text pointer as a parameter to the READTEXT, WRITETEXT or UPDATETEXT functions.

In addition, we are responsible for controlling the locking / isolation of the text pointer from changes while we are reading it through the use of HOLDLOCK locking hints and within transactions. This prevents anyone from changing the data while we are 'dereferencing' the text pointer. Alternatively you can just SELECT the data into VARCHAR / NVARCHAR / VARBINARY local variables and risk truncating the data. You can also SELECT the column INTO another table that has TEXT / NTEXT / IMAGE columns.

Whilst this sounds like many hoops to jump through, it is actually beneficial as developers resort to handling TEXT / NTEXT / IMAGE data outside the database. Managing these data types from ADO.NET / OLE DB is far easier than trying to manipulate the data within the database. The data can be extracted in plain SELECT statements (instead of READTEXT) and be streamed back to the client in chunks. In addition, this is a far more scalable alternative as it doesn't consume precious memory resources on the SQL Server.

New to SQL Server are the 'MAX' versions of VARCHAR / NVARCHAR / VARBINARY. There are the replacements for TEXT / NTEXT / IMAGE that have now been marked as deprecated and will be removed in a future version of SQL Server. These 'MAX' versions are meant to address the in-database manipulation difficulties of their predecessors. They are also designed to be able to handle small to very large data sizes efficiently without having to choose between VARCHAR / NVARCHAR / VARBINARY and TEXT / NTEXT / IMAGE as before.

All of the following capabilities apply equally to VARCHAR / NVARCHAR / VARBINARY. For brevity I'll just refer to VARCHAR.

You can now declare local variables of type VARCHAR(MAX) so that you can construct strings greater than 8,000 bytes (8,000 VARCHAR characters or 4,000 NVARCHAR / Unicode characters) in size and select VARCHAR(MAX) column values directly into your local variables. This now makes cursors using large value types usable in the database (again, just because you can doesn't mean you should…).

The concept of a text pointer no longer exists. Goodbye TEXTPTR, TEXTVALID, READTEXT, UPDATETEXT and WRITETEXT!

All of the string functions have been 'upgraded' so that they can be used with VARCHAR(MAX). Previous only PATINDEX, DATALENGTH and SUBSTRING could be used with a TEXT column. Now all of the family favourites like LEN, LEFT, RIGHT, etc. can now be used on large-value data types. However, note that there are implicit overloads to some of these string functions. As an example, execute the following TSQL and note the output:

declare @smallVarChar varchar(1)
set @smallVarChar = '*'

select datalength(replicate(@smallVarChar, 8001)) as LengthOfSmallVarChar
go

LengthOfSmallVarChar
--------------------
8000

declare @maxVarChar varchar(max)
set @maxVarChar = '*'

select datalength(replicate(@maxVarChar, 8001)) as LengthOfMaxVarChar
go

LengthOfMaxVarChar
--------------------
8001

The first TSQL batch calls the REPLICATE function with a small (non-MAX) VARCHAR. In return, REPLICATE returns a non-MAX VARCHAR that is truncated at 8,000 characters. However, if we pass in a VARCHAR(MAX) to REPLICATE we get a VARCHAR(MAX) back out and no truncation. So watch out for this. The transition from small to MAX VARCHARs is not implicit. If your requirement is to accommodate large VARCHARs then this needs to be baked into your code from the outset.

VARCHAR(MAX) and its counterpart TEXT can interact with each other. For example, the following TSQL is perfectly legal:

-- Create a table with a TEXT column...
create table bigTable (
    col1 text not null)
go

-- Add some data...
declare @col1 varchar(max)
set @col1 = replicate(cast('*' as varchar(max)), 10000)

insert bigTable (col1)
values (@col1)
go

-- Read out the TEXT into a VARCHAR(MAX)
declare @col1 varchar(max)
select @col1 = col1
from bigTable

select len(@col1)
go

This is very useful as it means you don't have to redefine all of your TEXT columns in your database when you migrate from SQL Server 2000 to 2005. If you need to manipulate TEXT data in the database just SELECT it into a VARCHAR(MAX) variable.

Lastly, a couple of points to keep in mind with VARCHAR(MAX):

  • The size range for VARCHAR is now 1 to 8,000 or MAX. Note there is nothing between 8,000 and MAX.
  • VARCHAR(MAX) is more than a replacement from TEXT. It works just as well with short strings as well as long ones. Therefore, if there is any possibility that you'll exceed 8,000 bytes then use VARCHAR(MAX).
  • The maximum size for a VARCHAR(MAX) is 2^31 - 1 bytes / characters.
  • The [text in row] option for a table remains specific to TEXT / NTEXT / IMAGE data types. This option lets you tell SQL Server at which size to move TEXT / NTEXT / IMAGE data out of the table's data pages and into its own page allocation (assuming that the [text in row] option is ON). The new [large value types out of row] setting replaces the TEXT option for VARCHAR(MAX). However, this option is now binary. It is either ON or OFF. The size at which MAX types are moved out of a table's data pages and into its own data pages is now fixed at 8,000 bytes if [large value types out of row] is set to OFF.
  • The UPDATE statement has been enhanced to provide a replacement for UPDATETEXT. To update a portion of a MAX data type use the .WRITE suffix. For example:
-- Create a table with a VARCHAR(MAX) column...
create table bigTable (
    col1 varchar(max) not null)
go

-- Add some data...
declare @col1 varchar(max)
set @col1 = replicate(cast('1' as varchar(max)), 10000)

insert bigTable (col1)
values (@col1)
go

-- Update a portion of the VARCHAR(MAX)...
-- Update the 4th, 5th and 6th characters, truncate the rest
-- of col1.
update bigTable
set col1.write('222', 3, null)

select *
from bigTable
go

col1
-------
111222

Next up are CTEs (Common Table Expressions).

Labels: ,

Previous Posts