Tech News
← Back to articles

While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data

read original related products more articles

From the personal blog of interface expert Bruce Ediger:

Early in March 2025, I noticed that a web crawler with a user agent string of

meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)

was hitting my blog's machine at an unreasonable rate.

I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying.

I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had "meta-externalagent" in its user agent string with the contents of a bork.php generated file...

This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025...

... continue reading