Optimize indexing strategies.
This objective may include but is not limited to: table-valued function, views, filtered indexes, indexed views, clustered and non-clustered, unique
I feel like this section is a strange one. I’m not sure that I know how I would spell out ways of concentrating on how to optimize indexes, but I don’t know that the way that this section is organized would be how I would do it. As a result, I am reorganizing the bullet points here to describe clustered, non-clustered, unique and filtered indexes first. I will then talk about table valued functions, view, and indexed views. I believe that it will do more good to describe indexes first, and then describe how it is that they are going to be applied to some of the database objects and the effects that they will have on them. As there is so much good information on indexes available, I am going to attempt to make this somewhat brief. I won’t be able to cover the subject as well as it has been covered in the past, so instead I will attempt to link to items which I have found useful in the past.
Clustered and Non-Clustered Indexes
It is difficult to judge what level of understanding is needed for this exam. Certainly a cursory look at the article for Table and Index Organization will be applicable. What I’m not certain about would be how helpful it would be to read several of the chapters from the SQL Internals book relating to the way files and allocations are set up, how data is stored on row and off, and how index pages are different from data pages. An alternate method to get to some of this information would be to listen to several of the (FREE!) mcm videos, such as the presentation on data files, data structures and index internals would be a more complete introduction. All of this is more than likely too much information for this exam and would simply complicate this post. As such, I will try my best to keep things relatively simple below.
A clustered index defined on the table will force the table into a balanced tree instead of a heap. This means that the index IS the table. Therefore, only one clustered index can be defined per table. As this is the way that data will be organized in your table, it is one of the most important decisions that you can make when designing a table. If designed in an OLTP system, developers will often place surrogate keys as the primary key AND the clustering index key so that there is a unique key, that there are not expensive page splits and so that the values of the unique key will be ever increasing. Furthermore, with an int (or bigint) the clustering key takes up a relatively small amount of space, which is important when you consider that this key will need to be included with every non-clustering index record to point back to the base structure, and that this value will also more than likely be the value with which you are joining your queries.
As I said, this is overly simplistic. Check out the clustering index debate mcm video or the blog posts by Kimberly Tripp.
A Non-Clustered Index defined on a table will contain the indexing keys, any included columns and a reference to the base table. This reference is either the clustered index key for a table with a clustered index, or the row identifier for a heap. If the reference is not a unique index, the index will be made unique with a “uniqueifier”, or integer that is used to make this a unique reference.
General Indexing Tips and Tricks
The following is a horribly simplified list of some things that might come up when choosing an optimal index, or set of indexes and should in no way shape or form be taken as a complete study on the topic. Instead, refer to the list of videos and books above for more information on the topic as well as the General Index Design Guidelines article.
- Consider the density of the columns you are choosing as part of the indexing key
- Remember to consider the “tipping point” for non-covering, non-clustered indexes
- Choose the order in which the columns appear in the indexing key wisely and design for inequality
- Will matter for sorts and for left based subsets that can satisfy more queries
- Only equality operators are eligible for seek, inequality will force a scan of the subsequent columns
- Consider which columns will be better suited to the INCLUDE section
- Sometimes the data is necessary for reading, but not for searching
- Can be useful for aggregates
- Consider whether or not to include variable length, non-unique or nullable columns
- Each of these have different costs associated
- Consider how often an indexing keys values will change, as well as the fillfactor
- Page-splitting is a very expensive operation.
- Consider how often the index will be read, as opposed to how often it will be written to or updated.
- Simple equation will help determine if the index is really “worth it”
- Consider how many indexes are already on the table and the type of system or table that you are indexing
- Some systems will rely on more indexes than others. For OLTP systems, there are several gurus out there who will stick to a general rule of thumb of no more than four or five indexes on a table (your mileage may vary).
- Remember that the more indexes you have, the more expensive a data modification will become.
- Check back often (use those DMOs)!
- Data access changes over time. Keep up to date with the indexes which are being used as well as the indexes which might be helpful.
See Query Tuning Fundamentals: Density, Predicates, Selectivity and Cardinality for an explanation of these terms.
A Unique Index is an index which guarantees that the data values contained in the index will be unique. This is helpful for verifying data integrity associated with the column values as well as for giving the optimizer more information about the cardinality of the data. A unique index is automatically created if you apply the UNIQUE constraint, although one can just as easily be created independently. Unique indexes can be created on both clustered and non-clustered indexes and can also be combined with filtered indexes.
Filtered Indexes are a new feature for SQL Server 2008. With this type of index, you can define a where clause that limits the records that will be stored in the index. This will potentially improve the performance of the query as it will have a much smaller space to scan as well more discrete buckets in the statistics. Furthermore, this should help to reduce the cost of maintaining the index as it will only be updated by those records whose data falls within the conditions. Finally, the index should take up less space, as it will have a subset of the total records.
Multiple filtered indexes can be defined on a single column, or set of columns, as a way of making the indexes more usable (i.e. less dense). Furthermore, filtered indexes are very helpful when used in combination with sparse columns, or columns with a large data skew. For instance, if you have a table with 4M records, 3.95M of which are a single value (say an integer corresponding to a “completed” state), and you are only interested in the records that are not equal to that value.
Inline Table Valued Function
An inline table valued function, as described under the Design T-SQL Table-Valued Function post, is essentially a parameterized view. This means that you should have a good idea of at least one of the columns that will be indexed (the parameter). Unlike a stored procedure, the Inline TVF is not susceptible in the same manner of generating a bad plan and sticking with it. This is because the Inline TVF will still expand into the outer query which is using it and optimize the underlying query. However, don’t be blinded by this point. You will still need to ensure that the underlying query is optimized for the filters, joins and returned columns as described.
Multi-Statement Table-Valued Function
Unfortunately, the Multi-Statement Table-Valued Functions are not quite as easy to optimize. Similar to a stored procedure, all of the statements contained in the TVF will need to be optimized independently. However, the table returned will be returned as a Table Variable, with all of the underlying problems (and the bonuses) that this confers.
A view is really just a stored query. As such, it is going to be expanded by the optimizer (providing it is not an indexed view, which we will discuss momentarily) and combined with any other portion of the query which is calling it. As such, there will be times when portions of the view are dropped out of the execution plan or rearranged. However, if you think enough of the query to store it as a view, we will probably want to optimize it just as we would any other query. Look at the join conditions, the filters, the columns returned, the characteristics of the columns used, the frequency of the updates, etc.
Unlike a regular view, an Indexed View (or virtual table) is stored (“materialized”) much like an index on a table would be. This will allow for very quick return on queries with complex joins, aggregates, or frequently performed queries. However, as this data is now being updated anytime that the underlying data is modified, there is a considerable cost if that data is volatile. Furthermore, the indexed view has a large number of requirements, as discussed in the Creating Indexed Views article, which need to be considered when gong through the design process. One such requirement is that the index is unique, another that it is deterministic. There are many more.