The US government’s Anthropic models ban was never about an AI jailbreak

The U.S. government’s enforcement letter to Anthropic, which effectively forced the company to pull its latest AI models offline just before the weekend, should be a wake-up call for any U.S. tech company — AI lab or otherwise.

To catch you up on the news blitz: On Friday afternoon, the U.S. Commerce Department sent Anthropic a letter invoking an obscure export control directive that banned non-Americans, including Anthropic’s employees, from accessing Fable 5 and Mythos 5, citing an unspecified national security concern. Anthropic said it believes the letter is related to a bypass of the model’s guardrails, but isn’t sure because the letter doesn’t provide specific details. The letter has not been made public.

In response, Anthropic shut down both of its top models to all customers to ensure that it complied with the directive. The result was that the U.S. government successfully forced a tech company to pull its models offline with a swift and unilateral action that didn’t appear to require court approval.

Friday’s intervention by the Trump administration shows that the AI industry is not immune to government interference. It’s also a warning to the wider tech industry: comply, or we can shut you and your products down.

Citing sources, Axios described a tense situation over the weekend between the two major players, saying that the “personality differences” between Anthropic and the Trump administration led to the export directive, rather than a technical issue with the AI products.

New details about the issue that emerged over the weekend now cast further doubt on the government’s already shaky reasoning.

Katie Moussouris, a cybersecurity veteran and researcher who founded Luta Security, said in a blog post that Anthropic recently shared with her a private copy of a paper written by security researchers describing an alleged guardrail bypass in Fable 5. (The Wall Street Journal reports that the paper’s authors are security researchers at Amazon.) Moussouris said that Anthropic reached out to ask for her take on the paper.

Moussouris’ blog post described how the researchers triggered the guardrail bypass, but said that the bypass itself “should never have triggered an export control.” The difference is largely between asking an AI model to “review code for security issues” versus asking it to “fix this code.” The end result is largely the same, even if the questions are posed slightly differently.

“The behavior described in the paper cannot meaningfully be fixed, and any attempt would only weaken the model for defense,” said Moussouris, who criticized the export control directive as hasty, heavy-handed, and misguided.

Moussouris and dozens of other top security researchers and experts have since called on the Trump administration to revoke the export control order, calling the move to pull advanced cybersecurity capabilities from network defenders in the U.S. as “dangerous.”

... continue reading