This post is meant to be a tutorial on a few different operations in SSIS and comes about as a response to a question on the SSIS forums. We will be looking at a few simple operations using expressions in a derived column. We will further look at the workings of the pivot transformation to reshape our data into the desired output…
Take in information of a specified format that contains an ID as well as a single column which contains a name and value pair. From this we would like to produce a single record for the associated name and value pairs for that ID.
To start with we throw up a very simple control flow which, for our purposes, only contains the data flow task. Now, usually I would use a template and provide auditing, scoring, etc but as this is just a sample we will keep it simple.
Next we will add the OLE DB Source and connection manager. For the connection manager, just choose to connect to localhost.master (or whatever db you have around) as we will be producing the data via command anyway…
Splitting an Input Column using expressions:
For this to work correctly we will need to have a known delimiter who’s first appearance designates the break between column name and value. In this case we are using the equals (=) sign. When splitting the column value pair we will be using expressions in the derived column transformation.
To get the name we use the expression –
SUBSTRING([InputColumn], 1, FINDSTRING([InputColumn], "=", 1) – 1)
Which essentially just takes the column value pair, and returns the sub-string before the delimiter by returning the characters starting at position 1 (with a base of 1, not 0) and then using the findstring to return the index of the first occurrence. You must use the -1 so that you do not return the delimiter.
To get the value we use the following expression –
SUBSTRING([InputColumn], FINDSTRING([InputColumn], "=", 1) + 1, LEN([InputColumn]) – FINDSTRING([InputColumn], "=", 1))
Similar to that above we use the sub-string function to return the characters starting at the position after the delimiter and running for the rest of the string (Length of the entire string minus the position of the delimiter to return the correct string).
Sort the ID and Column Value Pairs for pivoting:
The next operation is quite simple, we just use a sort transformation to ensure that the data going into the next step is in the correct order (Otherwise we might get multiple partial records).
Pivoting a Record:
Overview of a Pivot
Finally, the bulk of the operation… Pivoting a record. There are a few things that you really need to understand about this operation. First off, there are three general types of columns involved in a pivot operation. The SetKey which is used to identify the output record to which all of the individual records belong. The PivotKey which is used to determine which columns the values end up in. And, the PivotedValue which is the information that you would like in the various columns when the processing is finished. For our purposes, the ID is our SetKey, the InputColumnName is our PivotKey and the InputColumnValue is our PivotedValue.
Input Columns Tab
Now that we know a little more about the pivot transformation lets try to put it to use. Here is the Input Columns tab where we decide which columns we would like to have available for our transformation.
Pivot Default Input – Input Columns
Next we flip to the input and output properties tab and expand the input columns folder. From here we can select the columns that we will need to work with. When working with the column you would like to have act as the setkey (In this case ID), you will want to record the lineage id and set the pivot usage property to 1. The lineage id will be used later in the output columns section.
Next we will set up the PivotKey by selecting the InputColumnName. Here we set the pivot usage to 2. (There is no need to note the lineage id for the pivotkey… it is the only column that that is the case).
Finally we set up the pivoted values. Here we set the pivot usage to 3 and note the lineage id (This will also be used in the output step).
Pivot Default Output
Next we click on the pivot default output menu item and expand it. From here we will need to add a column for our SetKey as well as every known value that will appear in our PivotKey. In our case there will be four values, #COL1 – #COL4.
At this point we will click on the first item (Which may appear with a different name if you did not name it as it was being created, that is fine we will rename it in this step). Here we will take the Lineage ID that we recorded for the set key above and set the source column property to this value. We will also rename the column here using the name property.
Next we will edit the output columns representing the columns we expect to have returned through the pivot transformation. Click on the first item and edit the Name property to properly name the output column. Set the PivotKeyValue property to the value you expect to see returned (NOTE: no quotes on strings unless if the value you expect actually has quotes). Finally you set the source column property to the value you recorded from the pivoted values column above.
Output to a destination and connect data viewer:
Finally we will connect a row count to ensure that we are returning the correct number of rows. Really, I just use this to provide a connecting link on which I can place my data viewer…
And, here is the final screenshot
Really, nothing to it. You just have to understand a little bit about expressions as well as how things are set up with the pivot transformation. I have attached a copy of the sample dtsx on my skydrive (http://cid-23ac9944c8fa112a.skydrive.live.com/self.aspx/SSIS%20Test%20Packages/SplitAndPivot.dtsx)