After having a great conversation tonight at the JSSUG meeting, I am once again reminded of the time I spent learning SSIS. There are lots of subtleties and nuances that, if you’re not careful, will lead to poor performance. As such, I’m going to attempt to put out some very quick tips on the subject. I’m not sure of the regularity of these posts, or of their usefulness, but hopefully they’ll be helpful to someone. Enjoy!
When given the task of importing data which is of a different data type in the destination than it is in the source the new developer might jump to use the data conversion task to perform this option. Is this wrong? Well, no, not really. It is, after all, a tool that you can use out of the box and it will get the job done. However, without some of the knowledge of how data buffers are built and maintained behind the scenes you might not realize that this can lead to less than optimal performance.
Example 1: Data Conversion
This data flow uses a very simple data source, a data conversion transformation and a row count (as a terminator). When run, it generates 1409 records per buffer (with a 1 MB max buffer size) as seen below.
Example 2: Cast in Query
This is an even simpler data flow that uses cast within the data source to return the data with the desired data types. When run, it generates 1659 records per buffer (with a 1 MB max buffer size) as seen below.
Does this performance improvement outweigh the virtual self-documentation of a well named and formatted data flow with the proper components? Without the component someone who hasn’t looked at the package in detail might not realize that conversions were done to get the data to match up. However, this type of information can be conveyed in the description of the source and / or in an annotation next to the source for all to see. With the information conveyed in this fashion, it should be clear that using the cast will give you a performance benefit over using the transformation. So, we should probably shelve the transformation unless if we are in a situation where we can’t return the data with the type we would like for the destination (ex. flat files or stored procedures).