Over the past few weeks I have been fighting with what I would have at first thought to be a fairly straight forward task. I was to add a few breakouts for some financial information that would provide the data at the grain one level lower than was currently present in an aggregate table which tracked transactions at a daily level. I wasn’t changing the data source, or identifying anything new. I was simply to add in a decomposed measurement where in the past we had a total value, and now we wanted the total value and the two portions which made that total. Sounds pretty straight forward right? Well, I had never worked on this particular package before and I ran into a single brick wall which has been holding up an enhancement which I figured would take a few hours for many days now:
Reconciliation – the process of making consistent or compatible.
Oh that ugly word!
Of course I’m joking. In the ETL world, Reconciliation is a process which should be a major portion of the testing prior to a projects release. Counts of records should be checked, sums should be compared, data profiles should be compiled, etc. Unfortunately, this is vary rarely taken to the length that it should be. Business rules are often not consulted; or, if they are, the first pass through the rules does not properly define the measurements as stated. Numbers are compared to tables which themselves have not been fully vetted. Faults in the system are glossed over. In the end, chaos ensues! At least until the time and effort is spent to ensure that the correct data is in fact being stored, and, ultimately presented to the users.
For instance, as just one of the issues I ran into in the last week, one of the faults of the system I am investigating was that the transactions were not entering in the last amount to be subtracted before the reserves were set to zero. Instead, they were simply zeroed out. To the application, I am sure that this looks fine. On the GUI, the users see that the current amounts are zero. Enough said. To someone trying to track why it is that you still have a positive reserve on a closed item, it can give one more headaches than you can imagine.
I have brought up the topic of reconciliation through data quality metrics at work and in the community while discussing some of the objects which I try to track in my Meta database for the SSIS framework. In my view, it is imperative that metrics be tested and retested. That there be some form of automation that tracks whether or not the numbers are matching today, as well as whether or not they will become faulty after release when the source systems go through their own release cycles.
But, how do you ensure that you know enough about the data to be able to tell when there are anomalies? If you have a properly maintained data dictionary with the requisite logical data maps you should have a good start. If not, it can often take a lot of time tracking down the idiosyncrasies of the system. Even with the time and effort spent to examine your data, to build proper dictionaries and maps, and to educate the application owners, the business and the reporting staff, you still have a major effort of ensuring that this information is kept up to date and that the lexicon of the business does not change as it adjusts to changes in the market, in the technology or in its own understanding of the way it does business. This objective will more than likely be quite expensive to meet and maintain; but, in my estimation, it is absolutely worthwhile.
Note, however, that even without making a full venture into the realm of Data Quality it is important to understand that reconciliation is an absolute must have if you are moving data and storing it in an altered structure in another location – for the sanity of the next developer, if not your own…