Researchers at the AI security company Adversa AI have found that Grok 3, the latest model released by Elon Musk's startup xAI this week, is a cybersecurity disaster waiting to happen.
The team found that the model is extremely vulnerable to "simple jailbreaks," which could be used by bad actors to "reveal how to seduce kids, dispose of bodies, extract DMT, and, of course, build a bomb," according to Adversa CEO and cofounder Alex Polyakov.
And it only gets worse from there.
"It’s not just jailbreak vulnerabilities this time — our AI Red Teaming platform uncovered a new prompt-leaking flaw that exposed Grok’s full system prompt," Polyakov told Futurism in an email. "That’s a different level of risk."
"Jailbreaks let attackers bypass content restrictions," he explained, "but prompt leakage gives them the blueprint of how the model thinks, making future exploits much easier."
Besides happily telling bad actors how to make bombs, Polyakov and his team warn that the vulnerabilities could allow hackers to take over AI agents, which are given the ability to take actions on behalf of users — a growing "cybersecurity crisis," according to Polyakov.
Grok 3 was released by Elon Musk's xAI earlier this week to much fanfare. Early test results saw it shoot up in the large language model (LLM) leaderboards, with AI researcher Andrej Karpathy tweeting that the model "feels somewhere around the state of the art territory of OpenAI's strongest models," like o1-pro.
Yet Grok 3 failed to impress when it came to cybersecurity. Adversa AI found that three out of the four jailbreak techniques it tried worked against the model. In contrast, OpenAI and Anthropic's AI models managed to ward off all four.
It's a particularly troubling development considering Grok was seemingly trained to further Musk's increasingly extreme belief system. As the billionaire pointed out in a recent tweet, Grok replies that "most legacy media" is "garbage" when asked for its opinion of The Information, reflecting Musk's well-documented hatred for journalists, who have held him accountable before.
Adversa previously discovered that DeepSeek's R1 reasoning model — which threw all of Silicon Valley into disarray after it was found to be much cheaper to run than its Western competitors — also lacked basic guardrails to stop hackers from exploiting it. It failed to effectively defend itself against all four of Adversa's jailbreak techniques.
... continue reading