Music Tracking Post 02 – Looking at the input data

Going off of the fake business problem we would like to see what we have available within the provided files.  Our stated problem wants us to inquire about the following attributes:

  • Genre
  • Artist
  • Album
  • Track
  • Event Date Time
  • Device
  • User (listener)
    • Demographics

I have listed the demographics as a portion of the user entity as these will more than likely have been entered into a front end application of some sort that should be available as a separate file or some other accessible database.

The following is a sample of an available file (I have snipped most of the information to just show the relative formats).

File: ewisdahl_submissions.200910132225.backup

<?xml version=’1.0′ encoding=’utf-8′?>
<submissions version="1.2" product="Audioscrobbler" >
  <item>
    <artist>Pulley</artist>
    <album>Pulley</album>
    <track>Working Class Whore</track>
    <duration>214</duration>
    <timestamp>1255447384</timestamp>
    <playcount>1</playcount>
    <filename>01 Working Class Whore.m4a</filename>
    <uniqueID>C:\Users\Public\Music\Pulley\Pulley1 Working Class Whore.m4a</uniqueID>
    <source>2</source>
    <authorisationKey></authorisationKey>
    <userActionFlags>8</userActionFlags>
    <path>C:/Users/Public/Music/Pulley/Pulley/01 Working Class Whore.m4a</path>
    <fpId></fpId>
    <mbId></mbId>
    <playerId></playerId>
    <mediaDeviceId>1452-4617-ipod-Windows</mediaDeviceId>
  </item>
  <item>
    <artist>Defiance, Ohio</artist>
    <album>The Great Depression</album>
    <track>This Feels Better</track>
    <duration>79</duration>
    <timestamp>1255468867</timestamp>
    <playcount>1</playcount>
    <filename>07 This Feels Better.mp3</filename>
    <uniqueID>C:\Users\Public\Music\Defiance, Ohio\The Great Depression7 This Feels Better.mp3</uniqueID>
    <source>2</source>
    <authorisationKey></authorisationKey>
    <userActionFlags>8</userActionFlags>
    <path>C:/Users/Public/Music/Defiance, Ohio/The Great Depression/07 This Feels Better.mp3</path>
    <fpId></fpId>
    <mbId></mbId>
    <playerId></playerId>
    <mediaDeviceId>1452-4617-ipod-Windows</mediaDeviceId>
  </item>
  <item>
    <artist>Defiance, Ohio</artist>
    <album>The Great Depression</album>
    <track>Trip and Stumble</track>
    <duration>152</duration>
    <timestamp>1255468788</timestamp>
    <playcount>1</playcount>
    <filename>06 Trip and Stumble.mp3</filename>
    <uniqueID>C:\Users\Public\Music\Defiance, Ohio\The Great Depression6 Trip and Stumble.mp3</uniqueID>
    <source>2</source>
    <authorisationKey></authorisationKey>
    <userActionFlags>8</userActionFlags>
    <path>C:/Users/Public/Music/Defiance, Ohio/The Great Depression/06 Trip and Stumble.mp3</path>
    <fpId></fpId>
    <mbId></mbId>
    <playerId></playerId>
    <mediaDeviceId>1452-4617-ipod-Windows</mediaDeviceId>
  </item>
</submissions>

As you can see this is a typical XML file.  We have the root element “submission” followed by the item element.  These “Item” elements are essentially the records we are looking for as they contain the child elements related to the artist, album, track, duration, timestamp, playcounts, and media device.  The file name itself tells us the user, as well as the date and time that the file was created. 

In the next post we will build the destination database to handle the requisite information for the music to which we are listening. 

Advertisements
This entry was posted in Data Warehousing. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s