70-451–MCITP–Designing Queries for Performance–Optimize and Tune Queries

Requirements

Optimize and tune queries.
This objective may include but is not limited to: optimizer hints, common table expressions (CTEs), search conditions, temporary storage, GROUP BY [GROUPING SETS|CUBE|ROLLUP]

My Take

The task of optimizing and tuning queries is something which all database professionals will need to know how to do, and know how to do it well. Since these are the keywords to the article, I’m going to assume that the topics of the requirements should be looked at with performance in mind. So, first I will describe what each of these key areas are and then I’ll attempt to describe how they might affect performance.

Various Links on Performance Tuning

Although this section will talk about optimizer hints, CTEs, Search Conditions, Temp Storage and Group By statements, I figured that it would be good to throw some references to various performance tuning posts as that is the topic of the overall section. These are in no way shape or form the best, or only, posts on the subject. But they should be a good starting point on this topic for which several books, courses and careers have been made.

Optimizer Hints

The Optimizer hints, similar to the Hints we looked at for Locking and table based Isolation levels earlier, will guide the optimizer to particular action or object when constructing the execution plan. However, as stated previously, the first rule when thinking about using hints in sql server is to think twice. The majority of the time, the optimizer is going to choose an efficient plan to gather the data. Remember, the optimizer might choose the wrong plan now, but could correct itself the next time the query is recompiled. Also remember that a plan that might be right today could be incorrect tomorrow; if the hint is provided and the plan becomes incorrect, we will choose the wrong plan until the query is changed.

Join Hints

Join Hints can be used to guide the optimizer into a more efficient plan when joining tables. These hints are applicable to the sets which are being joined. I will explain the differences between the types of joins in the post for the next section, Analyze Execution Plans.

Loop

Will guide the optimizer to use a Nested Loops Join. This hint cannot be specified with Right or Full.

Hash

Will guide the optimizer to use a Hash Join.

Merge

Will guide the optimizer to use a Merge Join.

Remote

Remote will guide the join operation to be performed on site of the right table specified. One would generally specify this when the left table has less records than the right table and the right table is not local. The Remote hint can not be used with a join predicate which casts using the COLLATE clause. Furthermore, the Remote hint can only be used with INNER JOIN operations.

Query Hints

Query Hints guide the optimizer in all operations involved in the query and are specified with the OPTION clause.

{ HASH | ORDER } GROUP

The Hash and Order Group hints will command the optimizer to use only the specified operation to gather data for Group By, Distinct and Compute operations.

{ MERGE | HASH | CONCAT } UNION

The Merge, Hash and Concat Union hints are specified to command the optimizer to perform only that algorithm of the union operator. If more than one of the hints is specified the optimizer will attempt to choose the least expensive provided.

{ LOOP| MERGE| HASH } JOIN

The Loop, Merge and Hash Join hints are used to command the optimizer to use only that (or those) type of algorithm(s) when performing all of the joins in the query. If more than one of the hints is specified the optimizer will attempt to choose the least expensive provided. If a hint is used when writing the join, that hint is abided by over the OPTION hint.

FAST number_rows

The FAST hint is used to guide the optimizer to quickly retrieve the number of records stated. After these records are retrieved the query continues to return the result set as normal (although this continues with the given plan used to return the first batch of records).

FORCE ORDER

The FORCE ORDER hint is used to ensure that the order in which tables are joined in the query is the order used in the execution plan.

MAXDOP

The MAXDOP, or maximum degree of parallelism, is used to set the number of number of threads which the query will execute under. This hint is used to override the server setting of the same name and can be used to specify a higher number than that specified for the server. If the MAXDOP value is set to 0, the server will decide.

OPTIMIZE FOR ( @variable_name { UNKNOWN | = literal_constant } )

The Optimize For hint instructs the optimizer to use the given value for determining the execution plan, rather than the first input value. The value given is NOT used for the actual execution of the query.

This option can be very helpful if you have large data skew and are attempting to make the execution plans of your stored procedures stable, rather than depending on the values input on the first call of the stored procedure as it is being loaded into the cache.

OPTIMIZE FOR UNKNOWN

The Optimize For Unknown hint instructs the optimizer to use the underlying statistical data for determining the execution plan, rather than the first input value. If Optimize For Unknown is used in addition to a @variable_name, the query optimizer will use the given literal of the variable and Unknown for the rest of the variables.

PARAMETERIZATION { SIMPLE | FORCED }

The Parameterization hint can only be used in a plan guide. This hint is used to override the current setting of the parameterization of the database.

RECOMPILE

The Recompile hint is used to force the query optimizer to recompile the statement every time it is run. This can be useful for ensuring that a single statement within a stored procedure is recompiled or that a plan guide is being used efficiently.

ROBUST PLAN

The Robust Plan hint is used to instruct the optimizer to prepare a plan as if the records had the maximum potential row size.

KEEP PLAN

The Keep Plan hint is used to instruct the optimizer not to recompile the query when the estimated recompile threshold is met. This is useful in limiting the number of recompiles if the base table is updated frequently, or in causing a temporary table to use the standard recompilation thresholds rather than recompiling after 6 modifications.

KEEPFIXED PLAN

The Keepfixed Plan hint is used to ensure that a change in statistics will not force a recompile. (Only schema changes will force the recompile)

EXPAND VIEWS

The Expand Views hint ensures that the query optimizer will use the underlying tables behind an indexed view, rather than the materialized values of the view itself.

MAXRECURSION number

The Maxrecursion hint is used in Recursive CTEs to limit the levels of recursion before which the CTE will error out. The default server limit for Maxrecursion is 100. Valid values range from 0 (no limit) to 32767.

USE PLAN N’xml_plan’

The Use Plan hint is used to force the query optimizer to use a given plan. This hint cannot be used with data modification statements.

TABLE HINT ( exposed_object_name [ <table_hint> ] )

The Table Hint hint specifies the given table hint is used for the object. This is helpful in providing a plan guide when you cannot modify the original query. Index and Forceseek table hints are not allowed.

Common Table Expressions (CTEs)

We looked at CTEs a bit in the 70-451 – MCITP – Designing Programming Objects – Design Views post. As a quick recap, CTEs can be referenced several times within a query, enable us to write recursive queries, and are very similar to views (without needing to be stored as a database object).

One area where I see the ability to optimize queries with CTEs is when multiple passes are made at a table to perform aggregates. For instance, the same table might be passed over for the sum of a sales order by customer, another for the total number of distinct customers and a final one for the maximum value of a sales order. I wouldn’t be surprised to find these as three separate sub-queries that are joined based on the foreign key. However, you could write this as a CTE which exposes all three of these values and join the whole table based on the foreign key. (Windowing functions are your friend here)

However, CTEs can also be written to perform extremely poorly. If, for instance, you are doing lots of computation to gather the result set of a CTE that will be referenced multiple times, this computation will occur every time it is referenced. It might be more cost effective to select this data into a temporary table or table variable and read from that instead (see the temporary storage section below).

Search Conditions

Search conditions (and join conditions for that matter) can be a large part of why a query is or is not performing well. It is extremely important to know the operations available for the search argument and how they will interact with each other. It is also extremely important to realize when these operators might lead to slow performance. I’ll throw out a few examples of items which might cause poor performance, although this list is just the tip of the iceberg:

  • Using a sub-query with an IN clause, instead of an EXISTS clause (or NOT IN and NOT EXISTS)
    • This list will generate every value for the IN clause and return them all instead of simply returning the fact that a value is present. Also, you can run into problems where a null value could result in a logic bug if you are instead using a NOT IN (since null is not equivalent to null).
  • Using inequality operations
    • Less than and greater than can perform extremely well. However, as soon as you hit one of these operations it will have to do a scan on an index or table for any other filters that might be defined after the value with the inequality operator.
  • OR
    • The OR operation is necessary for many logical operations. However, it is often the cause of slow query performance as both of the conditions must be checked for every potential record. This will often force the query to use a sub-optimal plan that MIGHT be improved by rewriting the query with a UNION ALL statement.
  • <>, != or NOT
    • The <>, != or NOT operators will cause the operations to scan as they have to check every value to ensure that it is not the value given.

A topic that comes up frequently is the SARGability (or, Search Argument, SARGable) of a where clause. SARGable conditions are those which will not block the query optimizer from using an appropriate index. This can happen for multiple reasons, from using a function on the column being filtered, to implicitly casting a column, etc. Often, a query can be rewritten to become SARGable and improve performance. Here are a few examples:

  • SUBSTRING(ColA, 1, 10) = ‘XKCD Rocks’
    • ColA LIKE ‘XDCD Rocks%’
  • YEAR(myDateCol) = 2008
    • myDateCol >= CAST(‘2008-01-01′ as Date) AND myDateCol < CAST(‘2009-01-01′ as Date)

SARGability Links

Temporary Storage

Temporary storage, either temp tables or table variables, can be both a blessing and a curse. When used effectively, such as when limiting the number of times a complex query involving several tables and millions of records is run, they can greatly enhance performance. However, when used at whim they can cause bottlenecks due to the cost of instantiating the objects, the lack of (or poor) statistics used to generate the query plan, or the cost of recompilation. It is important to test thoroughly when choosing the type of temporary storage you are going to use, if any.

It should be noted that derived tables, views or CTEs can, at times, be much more efficient the using temporary storage at all.

Table Variables

Table variables are optimized as if they will return a single record. They do not keep statistics, after they are instantiated no DDL can be run against them, they cannot have a non-clustered index cannot easily have a non-clustered index (you have to use a unique index with the primary key, which you can have multiple of) and are really best used for result sets of a few records to perhaps 1000 records if the variable will be used to join to another table.  Table Variables are essentially “non-logged” and so will not revert values in the event of a roll-back.  (EDIT: They are still treated as an atomic statement, they do log information, just less than a temp table which logs less than a permanent table.) This can be a good or bad thing, depending on circumstances…

See the Performance Considerations when using a Table Variable.

EDIT: See A Trio of Table Variable (Gail Shaw)

Temporary Tables

Temporary Tables are objects which are created and destroyed for the particular connection. The temporary table and all indexes should be defined at the start of the procedure, preferably prior to starting a transaction, rather than creating it in the middle of the procedure.  Temporary table will be available to anything that is “in scope”; calls to stored procedures CAN see the temp table.

See the Troubleshooting stored procedure recompilation article for some very important considerations when using temporary tables within stored procedures.

Temporary Tables vs Table Variables

There are several differences between table variables and temporary tables, as discussed in the following articles:

Group By [Grouping Sets | Cube | Rollup]

Group By returns an aggregate for the unique combination of the columns listed. This is a common way to gather the SUM, AVG or COUNT of a particular combination of values. Often times, when reporting, an aggregate of the aggregate values is also desired. This is where the Grouping Sets, Cube and Rollup keywords come into play as they provide this aggregate of aggregates – each with it’s own flavor.

For efficiencies sake, Group By statements should have indexes defined on the columns which are defined on the group by clause, with the columns being aggregated located in the INCLUDE clause.

See the Using GROUP BY with ROLLUP, CUBE and GROUPING SETS article for more information.

Grouping Sets

The Grouping Sets are a new construct in SQL Server 2008 which will use the group by to only return the result sets in which you are interested. This can be very helpful when you have a situation when you want multi-level aggregates but you aren’t interested in all of them. In addition, the Grouping Sets can be defined to use the Cube or Rollup statements as part of the individual grouping set returned.

See the article on Grouping Sets Equivalents.

Cube

The Cube keyword of group by will return all combinations of the aggregate, as well as the aggregates of these combinations. This is similar to the idea behind using an OLAP dimension.

Rollup

The Rollup keyword of group by will return all combinations of the aggregate, as well as the aggregates of the combinations for the level of the aggregate.

About these ads
This entry was posted in SQL and tagged , , . Bookmark the permalink.

4 Responses to 70-451–MCITP–Designing Queries for Performance–Optimize and Tune Queries

  1. Eric Wisdahl says:

    Just reading over a post on table variables by Gail Shaw (SQL In the Wild) and saw that some of my information on table variables is less than correct. I don’t think that the gist of it is incorrect, but if it isn’t right, it isn’t right. With table variables, you can define several unique indexes that act as a non-clustered index. Table variables are logged, just less than tempt tables which are less than regular tables. I’ll edit in this link above.

    http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/

  2. Gail Shaw says:

    Also, the one-row estimations is not because there are no statistics. It’s because at the time the batch is compiled the table variable has not been created and hence has no rows in it.

    Try OPTION RECOMPILE on a query using a table variable and see. The estimation is based on the storage engine’s knowledge of how many rows are in the table variable.

    • Eric Wisdahl says:

      Thanks for the info Gail! I’ll take a look at some of our sprocs that use the table variables tomorrow to see if the cost of recompiling is worth the improved plan (statistics).

      I believe I may have mentioned it before, but I’ll reiterate anyway… I love the SQL Community!

  3. Pingback: MCITP 70-451 Links Page | Destination: Change

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s