Tech News
← Back to articles

We can't have nice things because of AI scrapers

read original related products more articles

In the past few months the MetaBrainz team has been fighting a battle against unscrupulous AI companies ignoring common courtesies (such as robots.txt) and scraping the Internet in order to build up their AI models. Rather than downloading our dataset in one complete download, they insist on loading all of MusicBrainz one page at a time. This of course would take hundreds of years to complete and is utterly pointless. In doing so, they are overloading our servers and preventing legitimate users from accessing our site.

Now the AI scrapers have found ListenBrainz and are hitting a number of our API endpoints for their nefarious data gathering purposes. In order to protect our services from becoming overloaded, we’ve made the following changes:

The /metadata/lookup API endpoints (GET and POST versions) now require the caller to send an Authorization token in order for this endpoint to work.

The ListenBrainz Labs API endpoints for mbid-mapping, mbid-mapping-release and mbid-mapping-explain have been removed. Those were always intended for debugging purposes and will also soon be replaced with a new endpoints for our upcoming improved mapper.

LB Radio will now require users to be logged in to use it (and API endpoint users will need to send the Authorization header). The error message for logged in users is a bit clunky at the moment; we’ll fix this once we’ve finished the work for this year’s Year in Music.

Sorry for these hassles and no-notice changes, but they were required in order to keep our services functioning at an acceptable level.