70-451–MCITP–Designing a Database for Optimal Performance–Design a Table and Index Partitioning Strategy

Requirements

Design a table and index partitioning strategy.
This objective may include but is not limited to: switching partitions, merging, splitting, staging, creating, schemes and functions

My Take

I had to do a lot of research into partitioning for one of my former employers as we implemented it to attempt to gain some performance improvements by moving portions of the data to separate I/O subsystems. As such, I know how useful partitioning can be for allowing data to be moved into separate filegroups with differing maintenance plans on each portion of the index and / or filegroup. I have reorganized the bullet points into what I though was a bit of more logical representation of how partitioning can be conceptualized. Of course, we should already know all about partitioning from the 70-432 MCTS exam as it was one of the measured skills.

The Designing Partitions to Manage Subsets of Data article contains a quick visual representation of splitting, switching, and merging. You can also take a look at the Partitioned Tables and Indexes in SQL Server 2005 whitepaper, the SQL Server 2008 internals book chapter 7, or the MCM video on Partitioning.

Schemes and functions

Partition function

A partition function defines the algorithm that will be used to assign a given row of a table or index into a partition based on a given input. If multiple inputs are given, the second and subsequent inputs are used ONLY as TIEBREAKERS. This function will split the inputs into buckets based on boundary values. The boundary values can be defined as LEFT or RIGHT based ranges, which define which range a given input will fall if it matches the boundary value.

/*
=============================================================
We will essentially create an empty partition for both the
beginning and end of the partition by ranging over values
which are not possible with the standard datetime
(i.e. < 01/01/1753 and >= 12/31/9999).
Other than that, we are defining right range value boundaries
for a few months of time.
Partition 1: Empty
Partition 2: Between 1/1/1753 and 11/30/2009
Partition 3: Between 12/01/2009 and 12/31/2009
Partition 4: Between 01/01/2010 and 12/30/9999
Partition 5: 12/31/9999 and greater
=============================================================
*/
CREATE PARTITION FUNCTION [MonthlyDateRange_PF] (DATETIME)
AS RANGE RIGHT FOR VALUES (
'17530101'
, '20091201'
, '20100101'
, '99991231'
)
;

Partition Scheme

A partition scheme defines the filegroups associated with a given partition function. This association is based on the boundaries associated with the function. The partition scheme must have as many entries as the partition function, but it can have more. If more are given, they define the filegroup which will be used next.

/*
=============================================================
This scheme will define the file groups to which each partition
defined in the partition function above belongs.
Partition 1: Primary
Partition 2: Primary
Partition 3: My_Filegroup_1
Partition 4: My_Filegroup_2
Partition 5: Primary
The next defined filegroup to be used: My_Filegroup_3
=============================================================
*/
CREATE PARTITION SCHEME MonthlyDateRange_1_PS
AS PARTITION MonthlyDateRange_PF TO
(
[PRIMARY]
, [PRIMARY]
, MY_FILEGROUP_1
, MY_FILEGROUP_2
, [PRIMARY]
, MY_FILEGROUP_3 /* next used */
)
;

Creating

Partitioning is an enterprise only feature that allows for a table and / or indexes to be distributed onto differing file groups via a partition scheme as defined above. When defining the object, we must specify the partition scheme as part of the ON clause. It should be noted that a table and it’s indexes do not need to be based on the same scheme.

/*
=============================================================
Create the table with the proper partition scheme
=============================================================
*/
CREATE TABLE
[MyDates_WithFunction]
(
MyDate DATETIME
, MyOtherColumn VARCHAR(100)
, MyMaxVarcharColumn VARCHAR(MAX)
)
ON
MonthlyDateRange_1_PS(MyDate)
;
/*
=============================================================
Show creating the index while using the
partition scheme... VERY IMPORTANT to use
the scheme otherwise it is ignored and everything
is put with the clustered index (i.e. on the same
file group)
=============================================================
*/
CREATE CLUSTERED INDEX
[MyDates_WithFunction_MyDate_CIDX]
ON
[dbo].[MyDates_WithFunction]
(
[MyDate] ASC
)
WITH
(
PAD_INDEX  = OFF
, STATISTICS_NORECOMPUTE  = OFF
, SORT_IN_TEMPDB = OFF
, IGNORE_DUP_KEY = OFF
, DROP_EXISTING = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS  = ON
, ALLOW_PAGE_LOCKS  = ON
)
ON
[MonthlyDateRange_1_PS]([MyDate])
;
/*
=============================================================
Insert some values into the new table
=============================================================
*/
INSERT INTO
[dbo].[MyDates_WithFunction]
(
MyDate
, MyOtherColumn
)
(
SELECT
CAST('20080101' AS DATETIME)
, '20080101'
UNION ALL
SELECT
CAST('20090101' AS DATETIME)
, '20090101'
UNION ALL
SELECT
CAST('20091221' AS DATETIME)
, '20091221'
UNION ALL
SELECT
CAST('20100101' AS DATETIME)
, '20100101'
UNION ALL
SELECT
CAST('20100121' AS DATETIME)
, '20100121'
)
;
/*
=============================================================
Example to show where the data is stored.
=============================================================
*/
SELECT
$partition.MonthlyDateRange_PF(MyDate) AS PartitionNumber
, *
FROM
[MyDates_WithFunction]
;

Merging

When merging partitions, you effectively take out one of the boundaries of the partition and store the data from the partition being merged in the partition which originally did not hold the given boundary value (RIGHT range partitions will be folded into the LEFT partition and vice versa). You achieve this merge by using the ALTER PARTITION FUNCTION syntax with the MERGE RANGE argument. If the filegroup defined in the partition scheme is no longer used, and is not marked as the next to be used, it will be dropped from the scheme automatically.

/*
=============================================================
Merge the partition range which is defined by the 20100101
boundary.  This will leave the data that had been stored in
partition 4 residing in partition 3.  (All subsequent partitions
are now renumbered)
=============================================================
*/
ALTER PARTITION FUNCTION MonthlyDateRange_PF()
MERGE RANGE ('20100101')
;
/*
=============================================================
Example to show where the data is stored.
=============================================================
*/
SELECT
$partition.MonthlyDateRange_PF(MyDate) AS PartitionNumber
, *
FROM
[MyDates_WithFunction]
;

Splitting

Splitting a partition will result in a new partition being created with the given boundary condition. The partition where the new boundary condition exists is considered the new partition and is determined by the next filegroup used clause of the partition scheme. If no next used partition is given in the partition scheme, the alter to the partition function will fail.

/*
=============================================================
Next Add Another Partition to split the range
for the month of january
=============================================================
*/
ALTER Partition FUNCTION MonthlyDateRange_PF()
SPLIT RANGE ('20100115')
;
/*
=============================================================
Add a next used to address which file group is
going to be added next time that the scheme is
split
=============================================================
*/
ALTER Partition Scheme MonthlyDateRange_1_PS
   Next Used [PRIMARY] /* MY_FILEGROUP_1 */
;

Switching Partitions

In order to enable Transferring Data Efficiently by Using Partition Switching both tables will need to exist before the ALTER TABLE … SWITCH … statement, the partition (or non-partitioned table) you are switching to must be empty, the partitions must be defined on the same column, the source and target tables must share the same file group and they must contain the same column structure and order as well as having the same indexes, foreign key constraints, computed columns, rowguid properties, xml schemas (if applicable) and in-row settings for LOB data defined. Constraints must either be defined the same or the source must be a subset of the constraint of the target.

Note: You also can’t use the switch statement on a table which is replicated.

Staging

Due to some of the restrictions mentioned above, mainly the fact that the partition which you are switching to “must be empty” fact, it is often convenient to use a staging table to process the data prior to switching. This is performed rather than inserting into a populated partition, or merging or splitting a non-empty partitions since these operations can be rather operationally intense whereas the alter switch is a meta-data only alteration and is done very quickly.

One common scenario for using a staging table is with a method called Sliding Window Table Partitioning as explained in the white paper above, or in a slightly different variation in a few blog posts by Dan Guzman ( Sliding Window Table Partitioning, Automating Sliding Window Maintenance)

clip_image001

Advertisements
This entry was posted in SQL and tagged , , . Bookmark the permalink.

One Response to 70-451–MCITP–Designing a Database for Optimal Performance–Design a Table and Index Partitioning Strategy

  1. Pingback: MCITP 70-451 Links Page | Destination: Change

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s