Anthropic Was So Concerned About Its New Mythos-Based Model’s Power That It Lobotomized Its Ability to Improve Itself

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

Earlier this year, Anthropic refused to release its Mythos AI model to the public, saying it was simply too dangerous.

At the time, executives claimed the model was capable of punching through powerful cybersecurity safeguards, pointing at researchers who used it to discover thousands of vulnerabilities in widely-used open source code.

Months later, Anthropic was finally ready to go public with the model. On Tuesday, the Dario Amodei-led company announced a Mythos-powered model called Fable 5, which it claims is “safe for general use.”

However, new safeguards quickly frustrated AI researchers, who accused the company of intentionally lobotomizing Fable 5. The backlash was so fierce, Anthropic quickly made adjustments to the policy, as Wired reported on Wednesday, highlighting just how carefully the company is treading.

In its original announcement, Anthropic claimed the safeguards were designed to stop Fable 5 from improving itself, in “new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development.” Just days ahead of the launch, Anthropic released a report on “when AI builds itself,” a trend that “might increase the risks of humans losing control over AI systems.”

However, AI researchers were not impressed by Anthropic hamstringing its latest model’s abilities.

“Anthropic’s latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won’t notice,” AI research firm SemiAnalysis tweeted.

“We are already seeing Anthropic’s latest model’s moderation filters our GPU inference research and programming,” it added.

Other researchers accused Anthropic of using Fable 5 to “shadowban,” or quietly restrict the accounts, of AI researchers. According to the firm’s system card, interventions limiting requests for “frontier LLM development” will “not be visible to the user.”

... continue reading