Microsoft AI Researchers Just Discovered Something That’s Going to Make Their Bosses Extremely Mad

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

AI automation is typically exactly what it sounds like: automating tasks — many of which were previously carried out by humans — in an attempt to boost productivity and efficiency, often in a prelude to laying off workers wholesale.

However, a new yet-to-be-peer-reviewed paper conducted by a group of Microsoft researchers and spotted by IT Pro found that today’s top AI systems remain eyebrow-raisingly weak at real-world workplace tasks. In fact, they often screw them up badly: the team studied frontier models including OpenAI’s GPT 5.4, Anthropic’s Claude Opus 4.6 and Google’s Gemini 3.1 Pro, and found that during complex assignments, those cutting edge bots corrupted an average of 25 percent of the content in documents. (Older models failed even more severely.)

The researchers concluded that, overall, these “models are not ready for delegated workflows in the vast majority of domains” — which is a very striking finding from Microsoft in particular, which has made massive investments in AI and is actively trying to jam the tech into nearly every aspect of its Windows 11 operating system, often with disastrous results. (Curiously, the paper didn’t evaluate the company’s own Copilot AI.)

In other words, the Redmond giant’s researchers had every incentive to find something positive about AI in the workplace, but instead found that blindly trusting LLMs to handle internal documents will almost certainly result in everything from errors to data deletion.

As bosses everywhere push to replace human labor with AI, the Microsoft paper builds on a growing body of scholarship about “workslop“: AI-powered mush that lazy or clueless workers push onto their colleagues, but which ultimately just needs to be fixed by a careful human laborer.

On AI workslop: Companies Are Being Torn Apart by AI “Workslop,” Stanford Research Finds