I imagine that other people have some of the same issues with enterprise data models which I have had in the past. It seems that no one wants to be the person who might limit someone’s artistic abilities to write a good name, give a good description or otherwise capture some important piece of information. And besides, this column is variable length, so it really isn’t going to matter how big the field is because it is only going to take up the space necessary to store it, right?
Well, that might be true (to a point) for databases in general. However, SSIS is strongly tied to Metadata. And, since it is possible that a column might be 4000 characters long, it has to plan for such when sizing out the data buffers.
So, for a bit of a contrived example we will take a look at the Email Address table coming from adventure works as is, and as if it had been created as an nvarchar(4000). This may seem silly, but I have certainly seen many examples where a name or description column was set to either the max length (4000 / 8000) or just max when there really was no need for that. Usually this would yield a column that when profiled would have a max length of 40 or 50, sometimes a few hundred, and rarely over a thousand let alone 4000 / 8000 or enough to necessitate a max length field (a.k.a. lob data).
Here, as described above, we have as an example the email address table with the email address column blown out to nvarchar 4000. We haven’t changed the data any, we’ve only modified the metadata. So, we know that the maximum size of any value will be fifty characters in length. But, the SSIS engine can’t know this… We’ve told it that the data can be up to 4000 characters long. Since that is the max size, that is how wide it will have to build each of the data buffers to handle.
When we look at the output, using the default buffer size and max buffer rows, we get 1300 Records per buffer.
For the properly sized data we have a column that is nvarchar(50). When we run this data flow, we see that we are getting 9725 records per buffer under the default buffer settings.
Now, this is still a bit contrived because we already know the proper size for the data. However, you have the ability to perform a data profile on you base data, particularly when you see large width data columns in an extract.
If you notice a large disparity between the size of the column and the size that data actually uses, you should attempt to determine if there is a need to actually have that size data. If not, you can at a minimum cast / convert the column out to the appropriate size in the transfer, and size it appropriately on the destination. If you have access or influence to make the changes on the source system all the better.