Meta Database – Data Dictionary

The Data Dictionary is a set of tables used to describe other tables.  In this case, I have split the data dictionary into three tables: TableDictionary, ColumnDictionary and LogicalDataMap. 

TableDictionary

The table dictionary is used to describe data about the tables within a database.  It is a mixture of items retrieved from the system tables as well as other data which will need to be provided either through user input or through the extended properties of a table.

  • TableDictionaryKey – The surrogate key used to uniquely identify the entries within the table.
  • DatabaseName – The name of the database of which this table resides.
  • SchemaName – The name of the schema of which this table is a member.
  • TableName – The name of the table.
  • TableType -The type of table (Dimension, Summary, Snapshot, Transactional, etc)
  • DisplayName -The name of the table as it should be shown on reports or other screens/
  • BusinessName – The name of the table as it relates to the business.
  • Grain – The level of detail at which each record within the table is stored.
  • TableDescription -A description of the table.
  • IsCurrent – Flag to tell whether or not this is the current record for this table.
  • LastModified – Date when the record was last updated.

ColumnDictionary

The Column Dictionary is used to describe data about the columns.  It is tied to the TableDictionary table above through a foreign key (one to many columns to one table relationship).  Once again, this table is a mixture of information retrieved from the system tables as well as other data either gathered through extended properties or some other user interface.

  • ColumnDictionaryKey – The surrogate Key used to uniquely identify the entries within the table.
  • TableDictionaryKey – The Foreign Key to the TableDictionary Table.  Used to identify the database, schema and table combination to which this column belongs.
  • ColumnName – The name of the column, as stored in the database.
  • DisplayName – The name of the column as it should be presented on reports, GUIs or other interfaces.
  • BusinessName – The name of the column as it relates to the business.
  • ColumnDescription – A description of the column, what it is used for, and any other notes.
  • DataType – The data type of the column within the database.
  • Size – The size of the column (if applicable)
  • Precision – How precise the column is (if applicable)
  • Scale – What the scale of the column is (if applicable)
  • IsKey – Is this a key column?
  • FKToTable – The table to which this foreign key relates (if applicable)
  • AllowNulls – Does this column allow nulls?
  • DefaultValue – If a value is not present at insert, what will this column default to?
  • UnknownValue – If there is not a value present for this record, what should the value of this column be? (The unknown member is a data warehouse dimension concept)
  • ExampleValues – A list of a few different example values
  • SlowlyChangingDimensionType – What slowly changing dimension type, if any, is this column?
  • IsCurrent – Flag to tell whether or not this is the current record for this column.
  • LastModified – Date when the record was last updated.

LogicalDataMap

The logical data map is a table with a zero, one or many relationship to the column dictionary table which describes how data is entered in to the column in question.  It contains information on the source system, any transformations, join logic etc and any relevant comments which describe this process.

  • LogicalDataMapKey – The surrogate Key used to uniquely identify the entries within the table.
  • ColumnDictionaryKey – The Foreign Key to the ColumnDictionary table.  Used to identify the database, schema, table and column combination to which this logical data map applies.
  • SourceSystem – The system (database usually, sometimes a flat file) from which the data originates.
  • SourceSchema – The schema from which the data originates (if applicable)
  • SourceTableName – The table from which the data originates (if applicable)
  • SourceFieldName – The field (column) from which the data originates (If applicable)
  • SourceDataType – The data type of the field (Column) from which the data originates (If applicable)
  • ETLRules – The rules, transformation, joins, etc used to produce the data in the column for which this logical data map applies.  This will usually be some sql, psuedo-code or other snippet to describe the process.
  • LDMDescription – A description or other miscellaneous information for the logical data map.
  • IsCurrent – Flag to tell whether or not this is the current record for this Logical Data Map.
  • LastModified – Date when the record was last updated.

Population of the Dictionary

I have written a series of SSIS packages to interrogate the system tables of an sql 2005 or 2008 server.  These, along with the scripts for the Current Version of the META database have been posted to my skydrive’s public folder.  A separate set of packages could, and should, be written for a 2000 server as well as any other structured database from which this information can be pulled.  Finally, if the record for a particular table is not in the dictionary already, it will be added at run time from a set of audit parameters that are stored in each package.  The other information in these tables can be entered either through a GUI into this table structure or directly in to the source database objects using extended properties.  Furthermore, a separate system can be used to generate scripts for these extended properties using a variety of methods (For example, the dimensional modeling spreadsheet from the kimball group).

NOTE: Although I am not sure when it will be accomplished, it is my intention to write a front end for this system.  When this task is complete I will post the solution out to my skydrive and blog about it…

Advertisements
This entry was posted in SSIS and tagged , . Bookmark the permalink.

2 Responses to Meta Database – Data Dictionary

  1. Dave says:

    Hi i was hoping you might still have a copy of META database scripts that you developed. They do not seem to be on your skydrive

    • Eric Wisdahl says:

      I have added “MetaDBScripts.zip” to the skydrive public folder. However, I’m pretty sure that this isn’t the exact version of the scripts that matches these posts. I don’t believe that I have a version that matches them anymore as it has been some the article. To deploy the scripts, simply use the Deploy.Meta.sql script in sqlcmd mode (query sub-menu) and change the path at the top to point to the directory that the scripts are located. You will need to change the name of the database in each of the scripts from [dba] to [meta] or whatever else you choose.

      The version of scripts that I’m releasing is not the version I currently use, but rather a version that I was working on at home as a slightly modified from a version that I was working on at work several projects back at my last job. This version of the scripts that I have makes me cringe a little (I’ve since changed naming standards and the way that I wrote some of the sprocs, etc).

      Most notably, I would not use dynamic index delete / recreate. I have used a similar structure where I store the scripts to delete / recreate the indexes in a script engine sub-system, but have found that when generating them in this fashion it is too easy to loose indexes if you have a package failure.

      Oh yeah, the standard, use at your own risk, not responsible for any damages, yada yada yada type language applies. I’m happy to discuss what it is I’ve done in the past, what I’m doing now (as long as it doesn’t involve giving away company strategy, etc) and what I plan on doing in the future. To a point. Shout at me on twitter if you want @EricWisdahl.

      For what it is worth, I had planned on this series more being an introduction to the IDEA of a meta database. Something that I’d gladly show people how to create and use (since I didn’t particularly care for those high priced conferences where the speakers would show you their method, but wouldn’t give it to you to play around with).

      If you’re interested in an auditing framework and have not yet started, I’d highly recommend checking out pragmatic works.

      (since I have no clue if the link for the skydrive has changed or not, here is the version I just used to get to it:
      http://cid-23ac9944c8fa112a.office.live.com/browse.aspx/Public?uc=1
      )

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s