As I’ve taken the week off work, I thought as well as spending time with my family, I’d brush up my Python skills as they’ve been a bit neglected of late.
I’ve never tried XML parsing with Python so thought I’d cover that. Apple’s iTunes has the ability to export information about your music in XML and I’d been meaning to take a look at that for a while. Why not combine the two, so here’s my take on parsing iTunes export information with Python.
I thought i’d work on a small subset of my library, the ones I’ve actually paid to download from iTunes compared to the ones converted from CD.
The exported XML data is a bit peculiar. I would have assumed it to be values enclosed by sensible tag names e.g <artist>Human League</artist>
. However, it’s actually a bunch of neighbouring tags and values like this <key>Artist</key><string>Sheb Wooley</string>
Here’s a snippet from the actual data export I ran…
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Major Version</key><integer>1</integer>
<key>Minor Version</key><integer>1</integer>
<key>Application Version</key><string>6.0.5</string>
<key>Features</key><integer>1</integer>
<key>Music Folder</key><string>file://localhost/D:/Documents%20and%20Settings/Windows%20User/My%20Documents/My%20Music/iTunes/iTunes%20Music/</string>
<key>Library Persistent ID</key><string>C5DD29C89369B278</string>
<key>Tracks</key>
<dict>
<key>312</key>
<dict>
<key>Track ID</key><integer>312</integer>
<key>Name</key><string>The Purple People Eater</string>
<key>Artist</key><string>Sheb Wooley</string>
<key>Album</key><string>20th Century Rocks: 50's Rock 'n Roll - At the Hop</string>
<key>Genre</key><string>Pop</string>
<key>Kind</key><string>Protected AAC audio file</string>
<key>Size</key><integer>2260837</integer>
<key>Total Time</key><integer>135533</integer>
<key>Disc Number</key><integer>1</integer>
<key>Disc Count</key><integer>1</integer>
<key>Track Number</key><integer>5</integer>
<key>Year</key><integer>2001</integer>
<key>Date Modified</key><date>2006-09-28T09:54:23Z</date>
<key>Date Added</key><date>2006-09-28T09:54:10Z</date>
<key>Bit Rate</key><integer>128</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Play Count</key><integer>11</integer>
<key>Play Date</key><integer>-1042489964</integer>
<key>Play Date UTC</key><date>2007-01-24T08:55:32Z</date>
<key>Normalization</key><integer>7764</integer>
<key>Compilation</key><true/>
<key>Artwork Count</key><integer>1</integer>
<key>Persistent ID</key><string>302B45E87F01479F</string>
<key>Track Type</key><string>File</string>
<key>Protected</key><true/>
<key>Location</key><string>file://localhost/D:/Documents%20and%20Settings/Windows%20User/My%20Documents/My%20Music/iTunes/iTunes%20Music/Compilations/20th%20Century%20Rocks_%2050's%20Rock%20'n%20Roll%20-/05%20The%20Purple%20People%20Eater.m4p</string>
<key>File Folder Count</key><integer>4</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
<key>313</key>
<dict>
<key>Track ID</key><integer>313</integer>
<key>Name</key><string>Daisy Daisy</string>
<key>Artist</key><string>Johnny O'Tolle & His Naughty Band</string>
<key>Album</key><string>Gay 90's</string>
<key>Genre</key><string>Vocal</string>
<key>Kind</key><string>Protected AAC audio file</string>
<key>Size</key><integer>2346412</integer>
<key>Total Time</key><integer>125084</integer>
<key>Disc Number</key><integer>1</integer>
<key>Disc Count</key><integer>1</integer>
<key>Track Number</key><integer>2</integer>
<key>Track Count</key><integer>10</integer>
<key>Year</key><integer>2006</integer>
<key>Date Modified</key><date>2006-09-28T09:59:52Z</date>
<key>Date Added</key><date>2006-09-28T09:59:38Z</date>
<key>Bit Rate</key><integer>128</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Play Count</key><integer>6</integer>
<key>Play Date</key><integer>-1038647848</integer>
<key>Play Date UTC</key><date>2007-03-09T20:10:48Z</date>
<key>Artwork Count</key><integer>1</integer>
<key>Persistent ID</key><string>302B45E87F01490F</string>
<key>Track Type</key><string>File</string>
<key>Protected</key><true/>
<key>Location</key><string>file://localhost/D:/Documents%20and%20Settings/Windows%20User/My%20Documents/My%20Music/iTunes/iTunes%20Music/Johnny%20O'Tolle%20&%20His%20Naughty%20Band/Gay%2090's/02%20Daisy%20Daisy.m4p</string>
<key>File Folder Count</key><integer>4</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
This makes parsing the data a bit trickier than I had hoped for. I was hoping to use a nice simple XPath expression, but data like this looks like it’s more a job for a SAX based approach.
I took a look in O’Reilly’s excellent Programming Python, and found a nice SAX parser example to modify.
As it’s just a quick test, I’m making a few assumptions on the XML data that a production system would have to handle. In this case, I’m assume a tag order of Track ID, Name and Artist. Using this order, each time we see one of those tags come past, we can make up a Track object and store the relevant data. In this case, when we see Track ID we need a new Track object to store the data in. When we see Name, we store the track name in the object and when we see Artist we save the artist, push the Track object to our list of Tracks and clear the current Track object.
That’s a bit long winded, so here’s the code.
import xml.sax.handler
class ITunesHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.parsing_tag = False
self.tag = ''
self.value = ''
self.tracks = []
self.track = None
def startElement(self, name, attributes):
if name == 'key':
self.parsing_tag = True
def characters(self, data):
if self.parsing_tag:
self.tag = data
self.value = ''
else:
# could be multiple lines, so append data.
self.value = self.value + data
def endElement(self,name):
if name == 'key':
self.parsing_tag = False
else:
if self.tag == 'Track ID':
# start of a new track, so a new object
# is needed.
self.track = Track()
elif self.tag == 'Name' and self.track:
self.track.track = self.value
elif self.tag == 'Artist' and self.track:
self.track.artist = self.value
# assume this is all the data we need
# so append the track object to our list
# and reset our track object to None.
self.tracks.append(self.track)
self.track = None
class Track:
def __init__(self):
self.track = ''
self.artist = ''
def __str__(self):
return "Track = %snArtist = %s" % (self.track,self.artist)
In the real world, the Track class would offer a lot more functionality, in this case, it’s just for holding data and providing a pretty printer.
Now we need to parse the XML and display the results, here’s the code…
parser = xml.sax.make_parser()
handler = ITunesHandler()
parser.setContentHandler(handler)
parser.parse('D:\Documents and Settings\Windows User\Desktop\Purchased.xml')
for track in handler.tracks:
print track
Let’s run that code and see what we get…
Track = The Purple People Eater
Artist = Sheb Wooley
Track = Daisy Daisy
Artist = Johnny O'Tolle & His Naughty Band
Track = Don't Dilly Dally
Artist = Kidzone
Track = Jump In My Car
Artist = David Hasselhoff
Track = Puff, the Magic Dragon
Artist = Peter, Paul And Mary
Track = You Give Love a Bad Name
Artist = Bon Jovi
Track = Heart of Glass
Artist = Blondie
Track = Grace Kelly
Artist = Mika
Track = Standing In the Way of Control
Artist = Gossip
Track = Physical
Artist = Olivia Newton-John
Track = Don't You Want Me
Artist = The Human League
Track = Have a Drink On Me
Artist = Lonnie Donegan
Track = My Old Man's a Dustman
Artist = Lonnie Donegan
That’s great! OK, I’m not going to win any awards for my taste in music, but at least I can now think about building music services that use this data.