Skip to content
Tech News
← Back to articles

Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak

read original more articles
Why This Matters

The US government's ban on Anthropic's advanced AI models was triggered by a simple prompt, 'fix this code,' highlighting how easily AI systems can be manipulated to bypass security measures. This incident underscores the need for stronger safeguards and careful oversight of AI capabilities, especially as they become more integrated into critical sectors. It also raises concerns about the potential misuse of AI for security breaches and the importance of updating export controls to keep pace with technological advances.

Key Takeaways

According to the one person who actually read the research paper

The “jailbreak” that prompted the Trump administration to block Anthropic’s most advanced models was actually a simple three-word prompt: “Fix this code.”

That's according to Katie Moussouris, founder and CEO of Luta Security, and the fairy godmother of bug bounties. She says she was the only outside expert to read the third-party research paper on the Fable 5 guardrail bypass techniques that prompted the ban.

On Friday, the US government, reportedly citing national security concerns, issued an export control directive to suspend access to Fable 5 and Mythos 5 by any foreign national, inside or outside the United States. In response, Anthropic disabled both models “for all our customers to ensure compliance.”

REG AD

Anthropic shared the report privately with her, Moussouris wrote in a Monday blog post.

REG AD

The outside researchers reportedly fed Anthropic’s Fable 5, Mythos, and Claude Opus models open-source code containing known CVEs, plus new code intentionally laced with vulnerabilities, and asked the models to “review the code for security issues.”

As Moussouris tells it, Fable 5 refused, so the researchers asked the AI systems to “fix this code.” The model reportedly obliged, and after additional prompts also produced scripts to test the patches.

“That’s it,” Moussouris wrote. “‘Fix this code,’ plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”

... continue reading