I was able to make it through the Hoberman book as it came in at a light 140 pages or so. This book was written as an introduction type book, maybe more geared to a non-tech savy business analyst or someone without a good grasp on modeling concepts (I probably should have guessed this from the title). Even with this being the case, however, I was still able to pick up a few hints, tips and tricks from the quick read through. I would highly recommend this for more of its intended audience, those with little to no working experience with data modeling and or those people who are on the fringes of such activities but don’t really have to go about doing the dirty work themselves.
Now I will try to make it through the Data Quality book…. I am currently just past the first part wherein the authors describe what bad data is and how it might come to be. I suppose next I’ll be learning about how to correct it 🙂
Anyway, this is all coming about because I am trying to figure out the best practices for use at work. More than that, I am trying to deal with a good way to work with an aging mainframe system that has seen better days. I believe that this is a project which will show quite a few inadequacies, although I am not 100% certain of this (I have only found half a dozen or so errors in the way things are or have been handled in the past so far). It seems that my development for this has come to a near halt as I dig deeper and deeper into the subject and realize how far I really have to go. Today, I spent a good portion of my time writing up a naming standards and best practices document for the project (and I’m the only one on the project!). Over the next week I will have to go back and refactor a lot of the code that I had already written to abide by these standards… But, as far as I can tell I’ll be better set doing this now than latter. Next I will have to determine how best to set up the meta data model and begin to populate it as well as starting to investigate how we will score the system as a whole and more importantly the data that I am retrieving from the mainframe. All in all, I’m not sure how quickly I’ll be able to advance thorough this project and as an outcome of this I’m not sure how willing management will be to do it right or if they will instead want to "see results" (meaning get it done now, rather than correctly). Hopefully this will all work itself out…
Here is my Review for The Data Warehouse ETL Toolkit (which I also posted to amazon):
information for the topic that covers the majority of your Data
Warehouse efforts, the ETL process (or ECCD if you prefer, which you
probably will after finishing this volume). I took away some good ideas
on items that I probably would not have considered, mostly due to my
own ignorance, relating to Meta Data, QA and Error Corrections, Data
Lineage and Scoring, etc.
The Authors (Kimball and Caserta) do a good job of pointing out
other source books for items that the user will probably want to look
at in depth.
There is also a pretty good section explaining how to manage your
ETL project, the different roles of people who should be involved and a
pretty good project plan / checklist to use as you are getting started.
My only complaint is that I did not read this prior to starting my
own project and am instead having to correct items as I try to
implement these best practices.