AI Companies Are Learning an Ironic Lesson as the People They Pay to Improve Their Chatbots Are Just Feeding AI Slop Into Them

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

For tech companies racing to be the king of the AI hill, there are few things more precious than raw, original data.

To keep the large language models underlying our favorite AI chatbots up to date, tech companies have to feed them reams of fresh inputs. As one study found, the amount of data being used to train AI has doubled every nine months since 2010 — exponential growth which may soon hit a wall as stores of clean data run critically low.

When there’s no more original content to pilfer, companies have started paying workers to generate fresh training data, offering them low-quality contracts to train AI in hyper-specific tasks like running weekly payroll for Broadway musicians. Others have been hired for to film themselves doing degrading or menial chores like folding laundry or distinctly adult activities.

Predictably, this growing workforce behind the AI boom has started cutting corners en masse, turning to other AI chatbots to supply the data meant to feed AI chatbots. Talking to New Scientist, numerous insiders said this practice of AI cannibalism — a method experts have long warned can destabilize LLMs — is shockingly commonplace.

“It’s very widespread,” a worker identified as Alice told NewSci. “Every company I’ve worked for has had explicit guidelines around it and they clearly do try to catch people out, so I think they do care. But I don’t think they can stop it.”

In other words, AI companies are learning an ironic lesson: after purloining everybody else’s content without permission to create a product that threatens employment across the economy, the new precariat they’ve created are using the same tech to do the few human tasks they still need in as lazy a fashion as possible.

Though workers have to be careful not to be too obvious, Alice says it isn’t hard to pass AI-generated data off as her own, provided she scrubs the obnoxious linguistic tics of chatbots like ChatGPT before she submits it. “It’s only the sloppiest of users that get caught,” the AI contractor told NewSci. “Anyone with a modicum of awareness around AI hallmarks can tell their output not to use them, and at that point what are you going to do?”

“If these companies want quality data, then they should offer quality contracts,” Alice continued. “Instead they’re low-balling struggling people, employing them for the barest possible amount of time and tossing them aside as projects are finished with no warning.”

Other contractors told NewSci they use LLMs in order to avoid making mistakes and losing their gig entirely.

... continue reading