Design tables. This objective may include but is not limited to: table width, sizing data types, IN_ROW_DATA (BLOBs), overflow data, sparse columns, computed columns, persisted computed columns
These requirements seem to be aimed at brining the various aspects of a tables storage to light. In particular, it will be vitally important to understand where the data will be stored as in row versus LOB or overflow storage. I would highly recommend the Microsoft SQL Server 2008 Internals book for dealing with this section, particularly the Large Object Storage section of the Special Storage chapter. (This book is still at the top of my to read list… I keep looking various things up when I get interested, just need to get motivated to read it cover to cover)
Generally, a table can contain up to 8kb per record (the size of one data page). However, with variable length data such as varchar, nvarchar, varbinary, large user defined CLR data types and sql_variant data types we can have each of these columns contain up to 8000 bytes per column. When combined, the record can exceed the general 8kb per record by containing a pointer to the page(s) that will contain the data of the variable length column. Having data in this overflow area will become more expensive as there will be additional I/O operations to read the relevant columns from other pages.
Sizing Data Types
I assume that this section is meant to drive home the fact that all choices of data type come with the cost to store them, and, more importantly, the costs to retrieve them and keep them up to date. As such, the database developer will need to think very hard before deciding on the data type, let alone the size, scale and precision of every column with variable size. The choice of data types can greatly affect performance of the database, as the amount of records stored on a single data page, and subsequently the amount of records stored in memory and the costs for I/O will be influenced by these decisions.
See Paul Randal’s posts:
- How Can Data-Type Choice Affect Performance?
- Just How Long Should You Make Your Character Fields? What’s the Longest Word?
The Binary Large Object data types (text, ntext, image) are by default stored in a separate data page than that of the row. However, when you set the "text in row" table option (sp_tableoption) the columns of the table are stored in row, as long as the amount of data is less than the set threshold (default 256 bytes). This will of course offer the bonus of not having to go to a separate page when reading the record in, but will mean that fewer records will fit on a single page if you aren’t selecting the column anyway.
The large value data types (varbinary(max), varchar(max), nvarchar(max), xml and large user defined types) are either stored as the regular type of variable length column (when they can fit on the same page) or they are treated as LOB data. Therefore, the default for this data type is to be stored on the same data page as the regular record. However, when you set the "large value types out of row" table option the columns of the table are stored out of row, with the 16 byte pointer to the root.
Row Overflow Data is variable length data that is stored in a separate data page when the amount of data in the original record is too large to store on a single page. It should be noted that each column is either completely on the original row, or completely off the original row. Furthermore, one record can have multiple data pages worth of overflow data if there are multiple columns that would cause the record to go over the 8k limit. When an overflow page is used, the column has a 16 byte pointer to the new page.
Sparse Columns are columns which are optimized for null storage. When the value is null, it requires no storage in the record. However, when the value is not null, it will require four extra bytes as the regular column, except for the variable length columns of which it will require two extra bytes. Sparse columns cannot be set to rowguidid or identity, can’t have a default value, can’t be bound to a rule and cannot be of type text, ntext, image, timestamp, user defined type (UDT), geometry, or geography. Furthermore, sparse columns cannot have the filestream attribute. See Using Sparse Columns.
Sparse columns can be used with columns sets to present all of the values for the various sparse columns in a single returned xml set. This also allows the user to update, insert or otherwise modify the data in multiple columns with a single "column" in the set statement.
Computed columns are virtual columns which are based on an expression from other columns within the same table. These columns are not typically stored in the table, which means that they are recalculated every time that they are referenced. However they can be stored in the database if marked as "Persisted" (See below).
A computed column can be used anywhere a regular database expression can be used. However, they cannot be used as a default or foreign key constraint or with a not null constraint. You can Create an Index on Computed Columns as a key column of an index, but you will need to meet the requirements (Same owner as the table, Deterministic, Precise, can’t evaluate to a image, text or ntext field, ANSI_NULLS ON, NUMERIC_ROUNDABORT OFF, etc ). Also, you can use a computed column as part of the primary or unique constraint if it is deterministic. A computed column cannot be targeted by an insert or update statement.
Persisted Computed Columns
This special type of computed column is stored as part of the table. It is still under the restrictions listed above, however, it is not computed every time that it is referenced. Also, these persisted columns MUST be used if you are going to be using the column as a partitioning column on a partitioned table.