Top Security Experts Alarmed by Power of Anthropic’s New Hacker AI

Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Email address Sign Up Thank you!

In November, Anthropic revealed that a Chinese state-sponsored hacking group had exploited its Claude AI’s agentic capabilities to infiltrate dozens of targets around the world.

It was trivially easy to get around Anthropic’s AI guardrails, with the hackers simply pretending to work for legitimate cybersecurity organizations — highlighting how woefully unprepared we are for powerful AI models that could accelerate the discovery of serious vulnerabilities.

And now, Anthropic’s latest Mythos AI model is making that nightmare scenario feel more real than ever. As Bloomberg reports, the company’s executives were seemingly so alarmed by the system’s capabilities that they decided to only make it available to a select number of organizations as part of “Project Glasswing.” The goal: give the organizations a fighting chance to get ahead of a potential cybersecurity crisis in the making.

But considering Anthropic has yet to publicly release its model, plenty of questions remain surrounding the company’s eyebrow-raising claims.

In his own testing, Anthropic-affiliated AI researcher Nicholas Carlini told Bloomberg that it didn’t take long for Mythos to get past security protocols and gain access to sensitive data.

His findings reflect the experience of the company’s Frontier Red Team, a group of 15 Anthropic employees tasked with challenging cybersecurity by simulating adversarial attacks.

“Within hours of getting the model, we knew it was different,” the team’s head, Logan Graham, told Bloomberg.

The biggest difference between Mythos and previous AI models was its ability to autonomously exploit vulnerabilities, an ominous new facet of the industry’s transition towards agentic models.

The Frontier Red Team even caught earlier models of Mythos trying to cover its tracks after violating human instructions, according to the model’s system card, as well as escaping a sandbox environment and gaining access to the internet.

... continue reading