Tech News
← Back to articles

Pirate archivist group scrapes Spotify's 300TB library, posts free torrents for downloading — investigation underway as music and metadata hit torrent sites

read original related products more articles

Spotify, the largest music streaming platform in the world with hundreds of millions of active users, and an extensive library of music has allegedly been hacked by Anna's Archive. The shadow library, who labels itself as archivists, has apparently scraped nearly the entirety of the platform, downloading roughly 300 TB of music that is now being distributed illegally via torrents.

Spotify has already acknowledged and responded to this attack, issuing the following statement to Android Authority:

"An investigation into unauthorized access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform’s audio files. We are actively investigating the incident."

That "some" in the above comment is key because the leaked collection consists of around 86 million files in particular, representing ~37% of all music available on the platform (but 99.9% of listens). Most of them are preserved in Spotify's original OGG Vorbis 160 kbps format, but if any song has a popularity rating of exactly 0, then they've been re-encoded to 75kpbs to save space.

With that, there's 256 million rows of metadata that accounts for 99.6% of all listens on Spotify and it has been complied into query-able SQL databases. The group has done a near-lossless JSON reconstruction of Spotify's API, including 186 million unique ISRCs. — identifiers for individual recordings worldwide; think of them as ISBNs for music. All the album info, artist info, cover art etc., is included.

(Image credit: Anna's Archive)

The blog post released by Anna's Archive going over this leak is surprisingly informative, including a bunch of charts that break down how Spotify treats music in general. For instance, around 70% of all songs on the platform barely get any attention, while 0.1% of the tracks are the most popular of all time. Most songs are also singles, rather than part of an album, and 120 BPM is the most common tempo.

Anyhow, the reason for this large-scale hack, as described by Anna's Archive itself, is preservation of music. Since the group is notorious for open-sourcing books without consent, it's applying much of the same logic here, arguing that Spotify's collection is too overtly focused on popular artists and sound quality. There needs to be an "authoritative list of torrents aiming to represent all music ever produced."

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors

The torrents are self-hosted, and the files are packaged using Anna’s Archive Containers (AAC), a custom format the group has used for years. The metadata has already been released while the rest of the data will follow a staggered release pattern in huge chunks, categorized by popularity. Therefore, the aftermath of this scrape will only truly show up down the line.

... continue reading