- What is meant by data quality
- Defining our data through collection of general meta data
- Profiling our data to see what it is that we really have
- Defining our data quality rules based off of the meta data and data profiles
- Preparing a system to capture the results of the data quality rules check
- Iteratively improving the data quality rules and capture process
Some subjects are covered in more detail than others. To me, this is a very good thing. For instance, there is a lot of source material out there for gathering meta-data. There is quite a bit out there for conceptual versus relational data models. The information presented is enough to pique your interest without bogging you down if you are unfamiliar with a given field.
With that said, we do get into quite a bit of detail on what I would consider to be the important subjects: what it that is meant by state dependent objects, what it is to data profile and how a data quality assessment project should be put together and whom it should contain. In addition, as would be suggested by the title of the book, the section on data quality rules for the assessments is very good. Here we get chapter level detail on Attribute and Domain Constraints, Relational Integrity Checks, Rules for Historical Data and State-Dependent Objects, as well as Attribute Dependency Rules.
Once we have a firmer grasp on what each of these mean, we go into the actual implementation details through the chapters on Implementing Data Quality Rules, Fine-Tuning the Rules, Cataloging Errors , Measuring Your Data Quality Scores, Implementing a Data Quality Meta-Data Warehouse and finally Recurrent Data Quality Assessment.
It should be noted that, like many other books in the field, this is not a complete reference for data quality. The authors states that we gloss over the subjects of data cleansing, monitoring data integration interfaces, ensuring quality of data conversions and consolidations, and building the data quality meta-data warehouse. I am not sure that I even agree with the list of subject areas that are skipped, but I find it somewhat interesting that the author continuously mentions that this book is the first in a series which will capture all aspects of data quality initiatives. When I look around it is the only title available. I think this volume is relatively strong on its’ own and probably could have done without the references to the non-existent material by stating that the subject area was out of scope for the book (which is essentially all that was done by these references).