As the internet chokes on ever more slop, the one thing that gives me hope is this: people seem to loathe AI, and are actively resisting it. This won’t be a long post, as I’m personally so tired of writing and thinking about AI at this point in time, but I do want to draw your attention here to some recent anti-AI stuff that’s worth discussing.
There’s a Reddit Community Devoted to Poisoning Data Consumed by Web Crawlers
r/PoisonFountain, created by individuals who claim to be concerned AI industry insiders, is a community with one goal: encourage as many people as possible to feed huge quantities of trash data (poison) to all of the web crawlers out there that are scraping our work for AI training sets. They aim to serve one terabyte of poison per day to these crawlers by the end of 2026.
The poison fountain itself is hosted on rnsaffn.com, sandwiched between several garbage links that look irresistable to AI crawlers; it produces a page of code that seems correct at first glance, but is actually riddled with subtle errors that render the code unusable. Filtering out these errors is possible, but expensive at scale. Since these companies can’t improve their AI models without fresh data created by human beings, the idea here is to waste their time and make it expensive for them to steal our data.
Miasma is one example of a tool that uses the fountain to serve massive amounts of garbage to malicious bots. The developer describes it as “an endless buffet of slop for the slop machines,” which is delightful. I can’t use Miasma with my site’s setup, but it may be of interest to those of you who could. I deliver my trash to crawlers using other means … some visible, some invisible. While I can’t serve it up to anywhere near the same extent as Miasma can, I do catch sneaky bots with my junk links every day.
If you’re pro-AI and feel outraged on behalf of these companies that anyone would dare try to make life difficult for them, please know that this is simply a case of tit for tat. The teams that send AI crawlers out into the world wide web are DDoSing small websites on the regular and raising hosting fees for everyone with their voracious desire to devour the entire internet. They do not obey robots.txt, and often hide their crawlers behind residential proxies. If they can’t source training data ethically, then I see absolutely no reason why any website operator should make it easy for them to steal it.
Caution: I'm messing with automated visitors in plain sight as an experiment. 🤭 To avoid false positives, human visitors are encouraged to ignore the link in this box.
Someone Figured Out How To Poison AI Video Summarizers
Thanks to r/PoisonFountain, I learned that YouTube has no .ass. I could try to explain what that means, but the video is hilarious and well worth a watch, so I’ll leave it up to @f4mi.
Sadly, it looks like the poisoning technique used by the creator in this video no longer works; YouTube presumably fixed the transcript loophole she was exploiting here. I plugged a few of her video URLs into a few different video summarizers, and they all failed to tell me anything that wasn’t actually in the videos.
... continue reading