Backing up Spotify
annas-archive.li/blog, 2025-12-20, Discuss on Hacker News
We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB), grouped by popularity. This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs. It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.
Anna’s Archive normally focuses on text (e.g. books and papers). We explained in “The critical window of shadow libraries” that we do this because text has the highest information density. But our mission (preserving humanity’s knowledge and culture) doesn’t distinguish among media types. Sometimes an opportunity comes along outside of text. This is such a case.
A while ago, we discovered a way to scrape Spotify at scale. We saw a role for us here to build a music archive primarily aimed at preservation.
Generally speaking, music is already fairly well preserved. There are many music enthusiasts in the world who digitized their CD and LP collections, shared them through torrents or other digital means, and meticulously catalogued them.
However, these existing efforts have some major issues:
Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded. Over-focus on the highest possible quality. Since these are created by audiophiles with high end equipment and fans of a particular artist, they chase the highest possible file quality (e.g. lossless FLAC). This inflates the file size and makes it hard to keep a full archive of all music that humanity has ever produced. No authoritative list of torrents aiming to represent all music ever produced. An equivalent of our book torrent list (which aggregate torrents from LibGen, Sci-Hub, Z-Lib, and many more) does not exist for music.
This Spotify scrape is our humble attempt to start such a “preservation archive” for music. Of course Spotify doesn’t have all the music in the world, but it’s a great start.
Before we dive into the details of this collection, here is a quick overview:
... continue reading