Tables – one look only, please – Or using CTEs

This post is in response to a query which I had to investigate earlier in the week which dealt with a number of very large tables.  I had never looked at this particular routine before and was slightly appalled when I did so.  In addition to the discussion of finding max value records, which I will address more on below, it had a table referenced 45 times in derived tables – each reference pulling out a sum for a particular set of conditions.  (Also, each reference with such great aliasing as T1, T2, T3, T4.) In reality, this was the problem as it was causing issues with parallelism that required a MAXDOP hint on the query. 

In any case, this is a discussion of how to find a record meeting ranking criteria in a single pass of a table.  I have seen all to many queries like the following:

Select
CompositeKey1
, CompositeKey2
, ColumnA
, …
From
myTable
Where
Exists
    (
        Select
         1
        From
         myTable as existsMyTable
        Where
         existsMyTable.CompositeKey1 = myTable.CompositeKey1
        Group BY
         existsMyTable.CompositeKey1
        Having
         Max(existsMyTable.CompositeKey2) = myTable.CompositeKey2
    )
;

This will leave the query having taken two trips to the table to pull back the relevant information.  Now, think about incorporating this into a query where this table is joined with other, equally large tables using equally bad queries which hit those equally large tables multiple times.  Got that image?  Ok, now imagine that there isn’t a two part composite key, but a multipart composite key.  Suddenly, inexperienced T-SQL programmers get a hold of the slightly ugly query above and put it into a state where it has no SARGs.

Select
CompositeKey1
, CompositeKey2
, CompositeKey3
, ColumnA
, …
From
myTable
Where
(CompositeKey1 + CompositeKey2 + CompositeKey3) IN
    (
        Select
         Max(CompositeKey1 + CompositeKey2 + CompositeKey3)
        From
         myTable
        Group By
         CompositeKey1
    )
;

Yuck.   

This is an area where one can use CTEs to help tremendously in reading data from a table for only those records that are equal to a max value of a column (or columns).  Using the example above, you would only want the records with the max sequence per key combination.  Here, any of the windowing functions will do wonders (RANK(), DENSE_RANK(), ROW_NUMBER()) depending on the exact results which you want.  For example:

With
myTable_MaxVersions as
(
    Select
     RANK() OVER (PARTITION BY CompositeKey1 ORDER BY CompositeKey 2 Desc, CompositeKey 3 Desc) as sequenceKey
     , CompositeKey1
     , CompositeKey2
     , CompositeKey3
     , ColumnA
     , …
    From
     myTable
)

Select
, CompositeKey1
, CompositeKey2
, CompositeKey3
, ColumnA
, …
From
myTable_MaxVersions
Where
sequenceKey = 1
;

Never underestimate the cost of hitting large tables more times than necessary.  And don’t forget the impact of using functions on columns in the where clause and what that does to index usage (READ: SARGS).

Advertisements
This entry was posted in SQL and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s