70-451–MCITP–Design Queries for Performance–Analyze Execution Plans

Requirements

Analyze execution plans.
This objective may include but is not limited to: execution order, logical and physical operators, join operators, minimize resource costs, compare query costs

My Take

Execution Plans are one of the best tools we have for investigating query performance. With the execution plan we can see the exact type of operation performed to join tables, sort data, etc. More than that, we can look at the differences in what the optimizer thought the query would cost (in terms of data size, number of records, I/O, cpu, etc) vs what it actually cost to perform the query. With the query plan, we have a roadmap for determining what type of improvements we should be striving for in order to make the particular statement more optimal. I would highly suggest SQL Server Execution Plans by Grant Fritchey, which is available as a free e-book (I thought enough of it to buy the physical copy).

For those of you interested in free third party tools, I have been using the Plan Explorer from SQL Sentry for about a month now and have found it to be an extremely helpful tool to look at execution plans.

Execution Order

The way in which you read a graphical execution plan is Right to Left, Top to Bottom. For the XML plan, they are nested elements. In the text plan, these are identified by the amount of indentation. When examining the execution plan, we should be thinking of the SQL Statement Processing and the particular path chosen when presented with the given inputs (index vs. table, scan vs. seek) and the order in which these inputs are joined. We should also think about how the order of these operations can be influenced through the various hints, such as FORCE ORDER, or through new indexes, updated statistics, etc. The actual items which are present in the execution plan can be greatly influenced by the amount of data being returned, the columns which are being filtered or joined, the sort order of the data, how current the statistics are, if the data being compared is of an appropriate type, etc. If a change is made to any of these items, the actual operators present, and the order in which they are executed in the query can be completely changed. The representation of these items is a hint that the changes you are making are having an impact (either positive or negative).  When investigating the sort order of cached items (such as stored procedures) you can look for items that don’t seem to make sense as a hint that the optimizer might have a bad plan stored.

Logical and Physical Operators (Iterators)

The logical and physical operators, or iterators, are represented in the graphical execution plan with a blue icon. These are going to be the bulk of the objects which you will see in your query plans and will usually represent the items you are looking for in determining if the query is executing as one would expect. This is the area in which you will determine if the correct indexes are being used, if the table or index is being scanned instead of using a seek, if the query is using parallel execution, etc.

Examples include: Bookmark Lookup (Key Lookup), Clustered Index Scan, Clustered Index Seek, Hash Match, Merge Join, Nested Loops Join, Nonclustered Index Scan, Nonclustered Index Seek, RID Lookup, Sort, Stream Aggregate, Table Scan, and Table Spool. For a list of all of the icons, and links to their individual meanings, see the article: Graphical Execution Plan Icons.

Join Operators

When examining an execution plan it is often relevant to look at the type of join operation a query is using. Each of the join operators have a time and place where they are efficient and are the right choice for the optimizer to build upon. However, each join operator also has corresponding circumstances where it will lead to inefficient query execution.

Merge Join

Merge Joins require both inputs be sorted on the join columns, which must be defined as an equality (=). A row for each input is gathered and compared, with the next row grabbed from the lower-valued input. If multiple rows have the same join order value(s), they will be rewound and compared to the next input.

The Merge Join itself is probably the fastest of the Join operators. However, the cost of the sort operation will often make this an inefficient choice; even with this extra cost, it is often cheaper to sort one input and use the Merge Join.

Nested Loops Join

Nested Loops Joins is an algorithm which joins the inner table (bottom) against the outer table (top). The inner input will be searched for each row of the outer input. This join type is cost effective if the outer input is reasonably small and the inner input is indexed and large. However, in my experience the nested loops join is the join type most likely to be chosen incorrectly based on out of date statistics (and thus, incorrect estimated number of rows) which can cause the number of logical reads to skyrocket.

Hash Join

Hash Joins are the most complex of the join types. A hash join can be classified as an In-Memory Hash Join, a Grace Hash Join, or a Recursive Hash Join. With each of the hash joins, an in memory hash table is built (using the smaller of the two inputs, or the build input) and the results compared to determine matches (using the larger of the two inputs, or probe input). The roles of the inputs can be changed dynamically, and the type of hashing used can be changed (promoted?) on the fly.

In Memory Hash Join

An In-Memory Hash Join is a hash join whose entire build input can be fit into available memory. This leads to a single build phase of the hash table in memory followed by the probe phase.

Grace Hash Join

A Grace Hash Join is similar to an In Memory Hash Join, except that the contents of the build input can not be contained in available memory. As such, several build phases are required to first group the build and probe inputs into several corresponding files. After this pairing is complete, the files are consumed in such a way as to build the input hash table and then compare the probe input. This effectively reduces the cost of joining the two large tables by splitting it into several, cheaper comparison tasks.

Recursive Hash Join

A Recursive Hash Join is used when multiple levels of partitioning will be needed to properly split the inputs.

Links

For a much more in depth explanation of the join operators and the concepts behind them, see the following posts by Craig Freedman:

Minimize Resource Costs

When examining an execution plan, either as the xml showplan, the text plan or the graphical plan, there is a lot of important information which is available, such as the estimated CPU cost, the estimated I/O Cost, the estimated and actual number of rows, estimated and actual number of executions, the estimated and actual cost of the operator, etc. Each of these values represents a cost that will be associated with the query when it is run. In order to optimize the query, one or more of these values will need to become more optimal.

Remember that, as always, there is no clear answer as to what type of operator, join or index will best suite a situation. With that said, we can usually look at an execution plan to determine if there are scans where we expect seeks, if there are key lookups which can be eliminated with a proper covering index, if there are join operators that are being used improperly (i.e. Nested Loops when Hash is more suitable, etc), if there are sort operators where none are expected, etc.

Compare Query Costs

When running multiple queries in a batch, you can compare the costs of the queries relative to the other queries run in that batch. The cost of each statement run is represented as a ratio (percentage) of the cost of the entire batch. Note that these numbers are just approximations.

In order to see if a particular index will be helpful, you can use the same query multiple times with a table hint to suggest which index to use for each. Similarly you could use other hints such as Optimize For or with a particular join hint to determine if forcing these actions would be optimal for the query. A query can be run and the execution plan saved, followed by updating the statistics and rerunning the query, then comparing the attributes of the execution plan.

Advertisements
This entry was posted in SQL and tagged , , , . Bookmark the permalink.

2 Responses to 70-451–MCITP–Design Queries for Performance–Analyze Execution Plans

  1. ALZDBA says:

    Very nice synthesis with excellent refs.
    This should get everyone curious enough to read Grants wonderful (e)book and to test Plan explorer.
    The more we get to know about the engine, the more we realize there is still more to learn.

  2. Pingback: MCITP 70-451 Links Page | Destination: Change

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s