'TrustFall' Convention Exposes Claude Code Execution Risk

Developers using the latest versions of AI coding tools like Claude Code, Cursor CLI, Gemini CLI, and CoPilot CLI could inadvertently execute malicious code on their systems with a single keypress, or no keypress at all in continuous integration environments.

That, according to researchers at Adversa AI, is because none adequately warn users of how a malicious repo can auto-approve and spawn a Model Context Protocol (MCP) server without their explicit approval or knowledge. All four coding tools show some form of a trust dialog prompting the user to indicate whether they trust a particular repo, but they do not offer full details on what that consent might actually entail.

Adversa AI identified Claude Code as offering the least information in its trust dialog, and Gemini AI as offering the most, along with a choice in terms of allowing or disallowing an MCP server to execute on the developer's system. But the exposure is the same in all four, according to Adversa's lead researcher, Rony Utevsky.

Related:Reverse Engineering With AI Unearths High-Severity GitHub Bug

"A repository can ship a configuration that auto-approves and immediately launches an MCP server, no tool call from the agent is required," he tells Dark Reading. "The variation is purely in how clearly the dialog tells the user what they are consenting to."

Anthropic itself however has described the issue that Adversa AI identified as existing outside its threat model, and it told Adversa AI that it believes its trust dialog offers sufficient warning to users. Anthropic pointed to how any malicious activity happens only after the user has allowed a repo/folder to be trusted or safe, Utevsky says, adding that Adversa AI has not raised the issue with the other AI coding toolmakers because Anthropic's approach appears to be the general convention.

"Once we identified the issue as a class-level convention rather than a vendor bug, vendor-specific disclosure stopped being the right shape of response: you can responsibly disclose a vulnerability to a vendor, but not a convention," he explains.

A Straightforward Path?

According to Adversa AI, all a threat actor would need to do to pull off an attack is create a repository that includes a malicious MCP server and configuration settings that auto-approve it to run. When a developer clones or opens the repo in the AI coding tool and presses "enter" on what appears to be a routine security check, the AI coding tool unwittingly launches the attacker-controlled code with the developer's full system privileges and no further prompting.

Related:Fresh Wave of GlassWorm VS Code Extensions Slices Through Supply Chain

... continue reading