Music Tracking Post 03 – A Simple Relational Database Design

In the previous post (Looking at the Input Data), I talked about the structure of the data contained within the XML file for the Fake Business Problem.  Using XML to load a relational database has become common practice.  At this juncture we need to decide on a structure to store the data.  One of the main decisions, in my mind, is whether to land the data wholesale from the XML or to break it into the respective data entities (i.e. artist, album, track, etc).  When relying on the initial data load to break the respective items down into the data entities we will either be loading data to multiple tables in one flow, a practice which I try to avoid, or we will be touching the file on the file system multiple times.  As such, I will design the process such that we will first dump the data into a table in order to benefit from the power of the SQL Server to query and manipulate the data.

So, we have decided to have a “submissions” table which will in effect mimic the xml input file.  For this, we are not going to have any constraints or integrity as we do not want to eliminate any records that might be coming in from the file.  We will, however, plan to perform some data quality checks on the data as it is entering the system.

Next we will look at the tables used to hold the various facts about the metrics which we would like to measure.  At this point, nearly every metric reverts back to events (or plays).  The only difference is the information we would like to track to the event.  The following is a short list of items which we would like to capture as it relates to a RELATIONAL style of the database.  We will revisit the de-normalized star schema version shortly.

  • An playback event is linked to a track (song) and records when the event took place as well as by which user and on what playback device. 
  • Each Track record contains information relating to the length of the track, the location of the track, the genre and the track name.
    • A Track will possibly appear on one or more albums and potentially have one or more artists that change by the album on which it was performed – Thus we will create the Album Tracks Table and the Album Track Artists table.
    • We could also include information such as ratings, descriptions, recording dates, etc.  However, for now we will keep things simple.
  • The Album record contains information relating to the name of the album and the total number of tracks on the album. 
    • An Album will contain one or more tracks and a track will potentially appear on one or more albums – Thus we will create an Album Tracks Table.
    • We could also include information such as ratings, descriptions, recording dates, etc.  However, for now we will keep things simple.
  • The Artist record will contain information relating to the name of the band. 
    • We could also include information such as descriptions, dates of activity, members of the band, etc  However, we will keep things simple for now
  • A Genre record will contain the name of the genre. 
    • We could also include information such as descriptions, etc
  • A user is an entity for whom we are recording information within the system. 
    • An unknown amount of information is available for each user but should include contact information as well as other potential demographic information.  This information will need to come in from a source other than the XML file.
  • A playback device is an entity capable of performing playback of a track.

So, in all we have the following tables:

  1. Submissions – a table used to hold the data coming in from the input xml files relating to play back events. 
    1. Grain: One record per playback.
  2. Tracks – a table used to hold data relating to the track (or song).
    1. Grain: One record per unique track name and track length.
  3. Albums – a table used to hold data relating to the album (or record). 
    1. Grain: One record per unique album name.
  4. Artists – A table used to hold data relating to the artist (or band)
    1. Grain: One record per artist.
  5. Genres – a table used to hold data relating to the genre of music.
    1. Grain: One record per genre.
  6. Users – a table used to hold information relating to the contact information and demographics of the user.
    1. Grain: One record per user.
  7. Playback Devices – a table used to hold the various devices which have enacted playback during a playback event.
    1. Grain: One record per playback device.
  8. AlbumTracks – a table used to hold the relationship of the track to the album(s) it is contained on and what track number it is on the album.
    1. Grain: One record per album track combination.
  9. AlbumTrackArtists – a table used to hold the relationship between the track and the artist(s) who performed it on a particular album.
    1. Grain: One record per track artist and album combination.
  10. PlaybackEvents – a table used to hold the data relating to the tracks and albums being played by the users on the various playback devices.
    1. Grain: One record per playback.

I freely admit that I am making this up as I walk through the process.  So, if there are changes that need to be made at a later point, I will describe them in another post and why it is that the changes were made…

Here is a database diagram of the appropriate tables (minus the submissions table, which I consider to be separate):

LastMusic-Tracks-DatabaseDiagram

NOTE: I know that there will be many different design choices available, and I’m quite certain that there will be better designs than the choices I have presented.  Keep in mind that this is simply a tutorial to show off some of the aspects of the BI stack as well as the SSIS Framework which I am currently using. 

Also note that this will use a slightly updated version of the META database which I have discussed in previous blog posts.  I will further discuss the audit process as it relates to using these tables in the SSIS Framework shortly.

Advertisements
This entry was posted in Data Warehousing. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s