A delve into CD ripping and metadata
As of late, I've been getting into growing my music collection by buying and ripping CDs. That was all fine and dandy until I discovered that one of my albums, Echo Afternoon by Finish Ticket, was slightly off.
On the left, the artist's Bandcamp page. On the right, my ripped files in VLC.
There are three problems here:
One track has changed name - Raincloud has turned into "Rainclous". Nothing Coming Soon should be 41 seconds long, but is instead 4 minutes, 26 seconds long. We're missing The Weight at the end of the album.
To work out just what's going on here, we've got to delve into some techy details about data and CDs.
How a CD rip works
If you ignore all the physics and lasers, an audio CD is a very simple thing. Aside from some audio tracks and a table of contents over those tracks, very little extra information is included on a disk - you've pretty much only got the artist name, album name and track names actually burned into the disk.
When ripping a CD to get the nice, lossless audio out of it, this poses a little bit of a problem - we want as much metadata to be included with our resultant files as possible, but there's not really the space to include much of it on the disk. Plus, since library managers like Navidrome rely on the metadata embedded within audio files to categorise and display information about our music, it kinda needs to be correct.
To get around this, open source metadata databases are used, like MusicBrainz, which you can think of as Wikipedia for music. High quality CD ripping programs (like Whipper -- which I use -- and EAC) will use the limited information available on a CD to look up a release on MusicBrainz and copy the metadata from there into the files it creates.
The key thing that allows this to happen is the table of contents (TOC) that's present at the front of any audio CD. This is used to indicate to the player how many tracks there are and how they're laid out on the disk. The TOC from Echo Afternoon looks like this:
1 11 189706 150 16281 33398 51150 66607 84179 104157 119432 138080 145514 168515
This can be broken down as:
1 : the disk number
: the disk number 11 : the number of tracks on this disk
: the number of tracks on this disk 189706 : The total number of sectors
: The total number of sectors followed by 11 track start positions Here, track 1 starts at sector 150, track 2 starts at sector 16281, etc., etc.
The raw TOC is used as a key to look up the disk on MusicBrainz, which has disk layouts stored alongside its metadata for releases with physical disks (since some releases are digital-only). While not technically a perfectly unique identifier for a disk, the TOC can be abused like this because of the sheer entropy it encodes. Each sector translates to 1/75th of a second of audio, so the resolution here means it's pretty improbable that two different releases will have perfectly identical TOCs.
Anyway, MusicBrainz has an API to guess the release from the TOC alone.
$ curl --header "Accept: application/json" https://musicbrainz.org/ws/2/discid/-?toc=1+11+189706+trimmed {"releases":[{"release-events":[{"date":"2024-09-06","area":null}],"packaging-id":null,"country":null,"title":"Echo Afternoon","id":"af4dc096-65d2-4cc5-9e0c-176d64fc4d04","asin":null,"quality":"normal","disambiguation":"","date":"2024-09-06","cover-art-archive":{"count":0,"back":false,"front":false,"darkened":false,"artwork":false},"text-representation":{"script":"Latn","language":"eng"},"status-id":"4e304316-386d-3409-af2e-78857eec5cfe","barcode":"846070070822","packaging":null,"status":"Official","media":[{"position":1,"format":"CD","track-count":11,"discs":[{"sectors":189706,"offsets":[150,16281,33398,51150,66607,84179,104157,119432,138080,145514,168515],"id":"kmIWXglj0JdNk4TAdp.nhkQ6Uhs-","offset-count":11}],"format-id":"9712d52a-4509-3d4b-a1a2-67c88c643e31","title":""}]},{"title":"Heart Monster Fear Machine","country":"SE","packaging-id":"f7101ce3-0384-39ce-9fde-fbbd0044d35f","release-events":[{"area":{"disambiguation":"","sort-name":"Sweden","id":"23d10872-f5ae-3f0c-bf55-332788a16ecb","type":null,"name":"Sweden","type-id":null,"iso-3166-1-codes":["SE"]},"date":"2010-08-27"}],"id":"c24cd88d-9a98-43e8-9c6c-cbacb70b2d7f","asin":null,"quality":"normal","date":"2010-08-27","disambiguation":"","barcode":"7320470137086","status-id":"4e304316-386d-3409-af2e-78857eec5cfe","text-representation":{"script":"Latn","language":"eng"},"cover-art-archive":{"count":1,"front":true,"back":false,"darkened":false,"artwork":true},"media":[{"track-count":11,"position":1,"format":"CD","title":"","discs":[],"format-id":"9712d52a-4509-3d4b-a1a2-67c88c643e31"}],"packaging":"Cardboard/Paper Sleeve","status":"Official"},{"asin":"B000DZJHQK","quality":"normal","title":"Planets Conspire","country":"CA","packaging-id":null,"release-events":[{"area":{"disambiguation":"","sort-name":"Canada","id":"71bbafaa-e825-3e15-8ca9-017dcad1748b","type":null,"name":"Canada","type-id":null,"iso-3166-1-codes":["CA"]},"date":"2005"}],"id":"df9a427d-88df-4b78-87dc-ad0d688041fd","media":[{"format-id":"9712d52a-4509-3d4b-a1a2-67c88c643e31","discs":[{"offsets":[225,16630,34044,51197,67508,86538,105552,120642,139362,151441,170392],"sectors":190968,"id":"g5AtalgG1KlL1R7ICe.iMkk8E6c-","offset-count":11}],"title":"","format":"CD","position":1,"track-count":11}],"status":"Official","packaging":null,"date":"2005","disambiguation":"","barcode":"638812728128","status-id":"4e304316-386d-3409-af2e-78857eec5cfe","text-representation":{"script":"Latn","language":"eng"},"cover-art-archive":{"artwork":true,"darkened":false,"count":1,"back":false,"front":true}}],"release-count":3,"release-offset":0}
Even when this API returns multiple perfect matches for the same TOC, it's likely that all the hits are just different releases of the same music, like in this example of That's the Spirit by Bring Me the Horizon:
Note the "attached to releases" section at the bottom
Here you can see that three releases have the same disk layout - the only difference between them being the markets they were released in and some superficial metadata like the barcode number and release date.
Start to end, the rip process looks roughly like this:
Read the TOC off of a disk
Look up the TOC on MusicBrainz
Select a release from the returned options (easy) and download its metadata
Dump audio from the disk to files
Apply the metadata to each track
Profit!
So - what's wrong with my CD?
Actually, two things. Which are both kind of the same problem.
Someone typed the wrong track name into MusicBrainz
"Raincloud" had been typed in as "Rainclous". That simple.
When this release was originally being added to MusicBrainz, it looks like the editor just mistyped that track name and this got downloaded and applied to our rip.
Two tracks on the CD were combined
This is a little bit weirder.
If you take another look at the screenshot of the artist's Bandcamp page, you might notice that there are 12 songs listed as being on the album, but only 11 included on the rip. One track is longer than it should be... and there's no track on the rip that's 41 seconds long, which is how long "Nothing Coming Soon" is.
In fact, it looks like "Nothing Coming Soon" has gotten 3 minutes and 44 seconds longer. Which is the exact length of the song after it, "Don't Need a Reason".
Wait.
Have those two tracks been merged?
...
Hah. Yep.
Taking a look at the metadata embedded into the disk itself, we can see that track 6 is actually titled "Don't Need a Reason" on there:
FILE "./06. Finish Ticket - Nothing Coming Soon.flac" WAVE TRACK 06 AUDIO TITLE "Don't Need A Reason" ISRC USDPK2300133 INDEX 01 00:00:00
And taking a listen to it shows that the short track has been rolled into the next track to form one, longer track, despite it being listed as a separate one on the back cover of the CD case:
Hello?????!?
So, the reason we're seeing a load of messed up track names on our rip is because:
Two tracks got rolled into one on the CD but nowhere else
This confused whoever was originally adding it to Musicbrainz
The combined tracks get added seperately
The "left over" track at the end of the album that's supposed to be the 12th track on our 11 track disk gets missed off when being typed into MusicBrainz
The root cause, again, is bad data in MusicBrainz that's been downloaded and applied to our otherwise correct rip!
The fix
It turns out that two tracks being rolled into one on a disk is a relatively common problem, since the MusicBrainz style guide has a dedicated section on dealing with tracks with multiple titles. The title of our one, combined file becomes "Nothing Coming Soon / Don't Need a Reason".
So, since MusicBrainz is publicly editable just like Wikipedia, I went ahead and amded the issues on this release myself (edit one, edit two).
Edits on MusicBrainz spend 7 days in limbo after they're created, to allow people time to vote on them. After 7 days, if they have a rating that isn't indicative of foul play, they get published. At the time of writing, my edits have 5 days left to go while they can be voted on, and are due to be made live on 2025-03-03. In the mean time, I edited the metadata of my ripped files manually to be in-line with what I updated MusicBrainz with and re-indexed my music database.