Tech News
← Back to articles

Feeding the machine

read original related products more articles

Frontier labs like OpenAI and Anthropic need vast amounts of data in the race to achieve AGI. This comes at a pretty penny — billions of dollars — and little-known companies like Mercor and Handshake are cleaning up in this AI hype cycle.

WhenWhen he was 19 years old, Brendan Foody started Mercor with two of his high school friends as a way for his other friends, who also had startups, to hire software engineers overseas. It launched in 2023 as essentially a staffing agency, albeit a highly automated one. Language models reviewed resumes and did the interviewing. Within months, Mercor was bringing in $1 million in annualized revenue and turning a modest profit.

Then, in early 2024, the company Scale AI approached Mercor with a big request: They needed 1,200 software engineers. At the time, Scale was one of the only well-known names in the historically back-of-house business of producing AI training data. It had grown to a valuation of nearly $14 billion by orchestrating hundreds of thousands of people around the world to label data for self-driving cars, e-commerce algorithms, and language-model-powered chatbots. Now that OpenAI, Anthropic, and other companies were trying to teach their chatbots to code, Scale needed software engineers to produce the training data.

This, Foody sensed, could herald a larger change in the AI industry. He’d heard about growing demand for specialized data work, and now here was Scale asking for a thousand coders. When the engineers he recruited started complaining about missed pay (Scale has a reputation among data workers for chaotic platform management and is being sued in California over wage theft, among other infractions), Foody decided to cut out the middleman.

In September, Foody announced that Mercor had reached $500 million annualized revenue, making it “the fastest growing company of all time.” The previous titleholder was Anysphere, which makes the AI coding tool Cursor. In a sign of the times, Cursor recently noted that its users produce the exact sort of training data labs are paying for, and The Information recently reported that OpenAI and xAI are interested in buying it.

Mercor’s most recent fundraising round valued the company at $10 billion. Foody and his two cofounders are 22 years old, making them the youngest self-made billionaires. At least one of their early employees has already left to start an AI data company of her own.

While discussions of AI infrastructure typically focus on the gargantuan buildout of data centers, an analogous race is happening with training data. Labs have already exhausted all the easily accessible data, adding to questions about whether early rapid progress through sheer increases in scale will continue. Meanwhile, most recent improvements have come through new training techniques that make use of smaller datasets tailor-made by experts in particular fields, like programming and finance, and AI companies will pay premium prices for it.

There are no good statistics on how much labs are spending, but rough estimates from investors and industry insiders place the figure at over $10 billion this year and growing, the vast majority coming from five or so companies. These companies have yet to find a way to make money from AI, but the people selling them training data have. For now, they are some of the only AI companies turning a profit.

“It’s every nook and cranny of human expertise.”

TheThe data industry has long been the most undervalued and unglamorous aspect of AI development, according to a 2021 study by Google researchers, seen as regrettably necessary janitorial work to be done as quickly and cheaply as possible. Yet modern machine learning could not exist without its ecosystem of data suppliers, and the two spheres move in tandem.

... continue reading