Fotia Ltd

Fotia Ltd

Tuesday, 7 February 2006

The INTERSECT and EXCEPT Operators

posted by Stefan Delmarco

The INTERSECT and EXCEPT operators are extensions to the TSQL language that implement left semi-join and left anti semi-joins respectively. This is not a new capability in the sense that these semi-joins were not possible before. They are merely an alternatively to expressing a set-based operation using an alternative syntax.

The best way to visualise the new INTERSECT and EXCEPT operators is to see them as extensions to the UNION (ALL) operator. The UNION operator allows you to vertically join the results of multiple SELECT statements into a single result set. For example (using the new AdventureWorks database), the following query returns all FirstNames where the LastName starts with 'A' concatenated with all FirstNames where the LastName starts with 'D':

select FirstName
from Person.Contact
where LastName like 'A%'
    union all
select FirstName
from Person.Contact
where LastName like 'D%';

This is a bit of a trivial example as you could have used an OR in a single WHERE clause but it illustrates the concept. Whereas UNION simply concatenates the two result sets (the ALL keyword prevents duplicates from being eliminated), INTERSECT and EXCEPT will only return distinct rows that either exist in both result sets or in only in the first (also known as the left) one. For example, the following query returns all FirstNames that are used by LastNames that begin with 'A' and by LastNames that begin with 'D'. In the example, Aaron is returned in the result set as there is an Aaron Alexander, Aaron Adams, Aaron Allen and Aaron Diaz.

select FirstName
from Person.Contact
where LastName like 'A%'
    intersect
select FirstName
from Person.Contact
where LastName like 'D%';

The equivalent in SQL Server 2000 would have looked like this:

select distinct PC1.FirstName
from Person.Contact PC1
    inner join Person.Contact PC2
        on PC1.FirstName = PC2.FirstName
where PC1.LastName like 'A%'
 and PC2.LastName like 'D%';

Also, note that the INTERSECT operator is symmetric in the sense that the same result set is produced if the queries are interchanged. The INTERSECT query below produces the same result set as the INTERSECT query above:

select distinct FirstName
from Person.Contact
where LastName like 'D%'
    intersect
select FirstName
from Person.Contact
where LastName like 'A%';

I'm sure you're already guessing what the EXCEPT operator does. However, before you jump to the conclusion that EXCEPT does the opposite of INTERSECT, read the fine print! The EXCEPT operator is a left anti-semi join. This means that it will return rows that exist in the first result set but not in the second. Note that will not include rows that appear in the second result set but not in the first.

The following query returns the FirstNames that belong to LastNames that begin with 'A' but do not belong to LastNames that begin with 'D'. In the example 'Heidi' is returned as there is only a Heidi Arun. There is no Heidi that has a LastName that begins with the letter 'D'.

select FirstName
from Person.Contact
where LastName like 'A%'
    except
select FirstName
from Person.Contact
where LastName like 'D%';

The equivalent in SQL Server 2000 is as follows:

select distinct PC1.FirstName
from Person.Contact PC1
    left outer join Person.Contact PC2
        on PC1.FirstName = PC2.FirstName
        and PC2.LastName like 'D%'
where PC1.LastName like 'A%'
 and PC2.FirstName is null;

As you can see, the equivalent for the EXCEPT operator is the classic 'find the missing rows' query (LEFT OUTER JOIN with an IS NULL on the JOIN'd column).

Whilst the equivalent query using the EXCEPT and INTERSECT operators is a bit more compact, the jury is still out in terms of the usefulness of these new INTERSECT / EXCEPT operators. Understanding the old-style syntax has taught some very valuable lessons as it requires a developer to comprehend the difference between INNER and OUTER joins, the difference between specifying filters in JOINs or WHERE clauses, and the real meaning of NULL. All of these are lessons that serious, industrial-strength developers need in order to deliver performant and scalable database implementations. I would only advocate the use of EXCEPT / INTERSECT if it produces a superior execution plan. So, if you find yourself writing a query that is attempting to correlate two result sets, give EXCEPT / INTERSECT a try!

Labels: ,

Previous Posts