Skip to content
Tech News
← Back to articles

'TrustFall' Convention Exposes Claude Code Execution Risk

read original get Secure Coding Practice Book → more articles
Why This Matters

The 'TrustFall' convention reveals a significant security vulnerability in popular AI coding tools, where malicious repositories can auto-execute harmful code without clear user consent. This exposes developers and organizations to potential cyberattacks, highlighting the need for improved transparency and security measures in AI-assisted development environments. The findings underscore the importance of scrutinizing trust prompts and consent mechanisms in AI tools to safeguard code integrity and system security.

Key Takeaways

Developers using the latest versions of AI coding tools like Claude Code, Cursor CLI, Gemini CLI, and CoPilot CLI could inadvertently execute malicious code on their systems with a single keypress, or no keypress at all in continuous integration environments.

That, according to researchers at Adversa AI, is because none adequately warn users of how a malicious repo can auto-approve and spawn a Model Context Protocol (MCP) server without their explicit approval or knowledge. All four coding tools show some form of a trust dialog prompting the user to indicate whether they trust a particular repo, but they do not offer full details on what that consent might actually entail.

Adversa AI identified Claude Code as offering the least information in its trust dialog, and Gemini AI as offering the most, along with a choice in terms of allowing or disallowing an MCP server to execute on the developer's system. But the exposure is the same in all four, according to Adversa's lead researcher, Rony Utevsky.

Related:Reverse Engineering With AI Unearths High-Severity GitHub Bug

"A repository can ship a configuration that auto-approves and immediately launches an MCP server, no tool call from the agent is required," he tells Dark Reading. "The variation is purely in how clearly the dialog tells the user what they are consenting to."

Anthropic itself however has described the issue that Adversa AI identified as existing outside its threat model, and it told Adversa AI that it believes its trust dialog offers sufficient warning to users. Anthropic pointed to how any malicious activity happens only after the user has allowed a repo/folder to be trusted or safe, Utevsky says, adding that Adversa AI has not raised the issue with the other AI coding toolmakers because Anthropic's approach appears to be the general convention.

"Once we identified the issue as a class-level convention rather than a vendor bug, vendor-specific disclosure stopped being the right shape of response: you can responsibly disclose a vulnerability to a vendor, but not a convention," he explains.

A Straightforward Path?

According to Adversa AI, all a threat actor would need to do to pull off an attack is create a repository that includes a malicious MCP server and configuration settings that auto-approve it to run. When a developer clones or opens the repo in the AI coding tool and presses "enter" on what appears to be a routine security check, the AI coding tool unwittingly launches the attacker-controlled code with the developer's full system privileges and no further prompting.

Related:Fresh Wave of GlassWorm VS Code Extensions Slices Through Supply Chain

... continue reading