Skip to content
Tech News
← Back to articles

Microsoft AI Researchers Just Discovered Something That’s Going to Make Their Bosses Extremely Mad

read original get Microsoft AI Research Mug → more articles
Why This Matters

This discovery highlights the current limitations of state-of-the-art AI systems in handling complex workplace tasks, emphasizing that reliance on these models without human oversight can lead to significant errors and data issues. For the tech industry and consumers, it underscores the need for cautious integration of AI into critical workflows and the importance of human oversight to prevent costly mistakes.

Key Takeaways

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

AI automation is typically exactly what it sounds like: automating tasks — many of which were previously carried out by humans — in an attempt to boost productivity and efficiency, often in a prelude to laying off workers wholesale.

However, a new yet-to-be-peer-reviewed paper conducted by a group of Microsoft researchers and spotted by IT Pro found that today’s top AI systems remain eyebrow-raisingly weak at real-world workplace tasks. In fact, they often screw them up badly: the team studied frontier models including OpenAI’s GPT 5.4, Anthropic’s Claude Opus 4.6 and Google’s Gemini 3.1 Pro, and found that during complex assignments, those cutting edge bots corrupted an average of 25 percent of the content in documents. (Older models failed even more severely.)

The researchers concluded that, overall, these “models are not ready for delegated workflows in the vast majority of domains” — which is a very striking finding from Microsoft in particular, which has made massive investments in AI and is actively trying to jam the tech into nearly every aspect of its Windows 11 operating system, often with disastrous results. (Curiously, the paper didn’t evaluate the company’s own Copilot AI.)

In other words, the Redmond giant’s researchers had every incentive to find something positive about AI in the workplace, but instead found that blindly trusting LLMs to handle internal documents will almost certainly result in everything from errors to data deletion.

As bosses everywhere push to replace human labor with AI, the Microsoft paper builds on a growing body of scholarship about “workslop“: AI-powered mush that lazy or clueless workers push onto their colleagues, but which ultimately just needs to be fixed by a careful human laborer.

On AI workslop: Companies Are Being Torn Apart by AI “Workslop,” Stanford Research Finds