The Data Quality tables of the META database are used to store the screens used to identify records or tables which fail a data quality check. These tables are further used to store each of these failure occurrences and the aggregates of the scores of the screens.
The screen dimension keeps track of the data quality screen that is being performed in the ETL process. It is used to house items such as feasibility ranges, value enumerations, and standard deviation checks. Each screen will have an execution action to determine what the process should do when encountering an error of this type… essentially the actions to be taken will include Passing the record(s), Rejecting the record(s) or Stopping the ETL process.
- ScreenKey – The surrogate key used to uniquely identify a record in the Screens table.
- TableDictionaryKey – The foreign key pointing to the table record in the table dictionary table for which this screen applies.
- ScreenName – The name of the screen being applied
- ScreenType – The type to which the screen applies. Should only contain Column Screen, Structure Screen or Business Rule Screen.
- ScreenCategory – This field is used to describe the types of errors detected by the screen.
- ETLScreen – The stage in the overall ETL process in which the data quality screen is applied
- ProcessingOrderNumber – a primitive scheduling / dependency device informing the overall ETL master process the order in which to run the Screens.
- DefaultSeverityScore – Defines the error-severity score applied to each exception identified by the screen in advance of an overarching processing rule that could increase or decrease the final severity score as measure in the fact table.
- ScreenSQL – The actual snippet of SQL or procedural code used to execute the data quality check. This should return the set of unique identifiers for the rows that violate the data quality screen so that this can be used to insert new records into the error event fact table.
- ExceptionAction – Tells the overall ETL Process whether it should pass and flag the record, reject the record or halt the ETL process.
- Active – Is this screen still in use?
The ErrorEvent fact table will hold the date and time that a record failed a screen, as well as the package execution, table dictionary key and record identifier.
- ErrorEventKey – The surrogate key used to uniquely identify a record in the Error Events table.
- EventDateKey – The "Smart Key" pointing to the Dates dimension which represents the date of the row in YYYYMMDD format.
- ScreenKey – The foreign key pointing to the record in the screens table which initiated this error event.
- AuditKey – The foreign key to the dim audit table which will be used to track what execution inserted this record.
- TimeOfDay – The datetime value for the time when this error event was registered (holds more information than just the FK)
- RecordIdentifier – The unique identifier for the record which was flagged in this error event. In combination with the table dictionary key from the screens table should be able to identify the exact record which failed the screen.
- FinalSeverityScore – This field will be the severity score from the screens table adjusted by any other weighting process which we implement for this screen.
The DataQualityScores table will hold every combination of the Three Screen Categories Scores (Column, Structure and Business Rules) and will be used to track the data quality score of every record for the table being screened.
- DataQualityScoreKey – The surrogate key used to uniquely identify a record in the Data Quality Screens table.
- OverallQualityCategory – The overall category, described in text, for the record.
- OverallQualityScore – The overall scores, described as a numerical measure, for the record.
- ColumnScreenScore – The Structure Screen Score, described as a numerical measure, for the record
- StructureScreenScore – The Column Screen Score, described as a numerical measure, for the record
- BusinessRuleScreenScore – The Business Rule Screen Score, described as a numerical measure, for the record
- ScreensFailedCount – The number of screens that this record has failed.
- MaxSeverityScore – The maximum severity score of a screen which this record has failed.
Population of the Data Quality Tables
The screens table will need to be filled in manually, or through a front end, with the data quality screens that you wish to implement to check the quality of your data. The data quality screens table will be filled in with the appropriate values when you are aggregating the data quality scores via the error events table, which is populated when you check the screens against your source data.
Note: this is the least defined portion of the META database as I am still striving to get a solid data quality program off the ground at my company. It is probably the least important aspect of the framework. As a matter of fact, I have recently removed the variables used to track these data quality scores from the main SSIS Framework as it makes more sense to me to only include these variables for those tabled which you are screening (and this will almost certainly be in a separate process from that which loads the data). However, I still feel very strongly about setting the foundation for a data quality effort to be present from the beginning as I feel that this will remind the user that it is an important aspect of any ETL project – Even though it is rarely preformed.