An attacker embeds a single instruction inside a forwarded email. An OpenClaw agent summarizes that email as part of a normal task. The hidden instruction tells the agent to forward credentials to an external endpoint. The agent complies — through a sanctioned API call, using its own OAuth tokens. The firewall logs HTTP 200. EDR records a normal process. No signature fires. Nothing went wrong by any definition your security stack understands. That is the problem. Six independent security teams shipped six OpenClaw defense tools in 14 days. Three attack surfaces survived every one of them. The exposure picture is already worse than most security teams know. Token Security found that 22% of its enterprise customers have employees running OpenClaw without IT approval, and Bitsight counted more than 30,000 publicly exposed instances in two weeks, up from roughly 1,000. Snyk’s ToxicSkills audit adds another dimension: 36% of all ClawHub skills contain security flaws. Jamieson O’Reilly, founder of Dvuln and now security adviser to the OpenClaw project, has been one of the researchers pushing fixes hardest from inside. His credential leakage research on exposed instances was among the earliest warnings the community received. Since then, he has worked directly with founder Peter Steinberger to ship dual-layer malicious skill detection and is now driving a capabilities specification proposal through the agentskills standards body. The team is clear-eyed about the security gaps, he told VentureBeat. “It wasn’t designed from the ground up to be as secure as possible,” O’Reilly said. “That’s understandable given the origins, and we’re owning it without excuses.”None of it closes the three gaps that matter most.Three attack surfaces your stack cannot seeThe first is runtime semantic exfiltration. The attack encodes malicious behavior in meaning, not in binary patterns, which is exactly what the current defense stack cannot see.Palo Alto Networks mapped OpenClaw to every category in the OWASP Top 10 for Agentic Applications and identified what security researcher Simon Willison calls a “lethal trifecta”: private data access, untrusted content exposure, and external communication capabilities in a single process. EDR monitors process behavior. The agent’s behavior looks normal because it is normal. The credentials are real, and the API calls are sanctioned, so EDR reads it as a credentialed user doing expected work. Nothing in the current defense ecosystem tracks what the agent decided to do with that access, or why.The second is cross-agent context leakage. When multiple agents or skills share session context, a prompt injection in one channel poisons decisions across the entire chain. Giskard researchers demonstrated this in January 2026, showing that agents silently appended attacker-controlled instructions to their own workspace files and waited for commands from external servers. The injected prompt becomes a sleeper payload. Palo Alto Networks researchers Sailesh Mishra and Sean P. Morgan warned that persistent memory turns these attacks into stateful, delayed-execution chains. A malicious instruction hidden inside a forwarded message sits in the agent’s context weeks later, activating during an unrelated task.O’Reilly identified cross-agent context leakage as the hardest of these gaps to close. “This one is especially difficult because it is so tightly bound to prompt injection, a systemic vulnerability that is far bigger than OpenClaw and affects every LLM-powered agent system in the industry,” he told VentureBeat. “When context flows unchecked between agents and skills, a single injected prompt can poison or hijack behavior across the entire chain.” No tool in the current ecosystem provides cross-agent context isolation. IronClaw sandboxes individual skill execution. ClawSec monitors file integrity. Neither tracks how context propagates between agents in the same workflow.The third is agent-to-agent trust chains with zero mutual authentication. When OpenClaw agents delegate tasks to other agents or external MCP servers, no identity verification exists between them. A compromised agent in a multi-agent workflow inherits the trust of every agent it communicates with. Compromise one through prompt injection, and it can issue instructions to every agent in the chain using trust relationships that the legitimate agent already built. Microsoft’s security team published guidance in February calling OpenClaw untrusted code execution with persistent credentials, noting the runtime ingests untrusted text, downloads and executes skills from external sources, and performs actions using whatever credentials it holds. Kaspersky’s enterprise risk assessment added that even agents on personal devices threaten organizational security because those devices store VPN configs, browser tokens, and credentials for corporate services. The Moltbook social network for OpenClaw agents already demonstrated the spillover risk: Wiz researchers found a misconfigured database that exposed 1.5 million API authentication tokens and 35,000 email addresses.What 14 days of emergency patching actually closedThe defense ecosystem split into three approaches. Two tools harden OpenClaw in place. ClawSec, from Prompt Security (a SentinelOne company), wraps agents in continuous verification, monitoring critical files for drift and enforcing zero-trust egress by default. OpenClaw’s VirusTotal integration, shipped jointly by Steinberger, O’Reilly, and VirusTotal’s Bernardo Quintero, scans every published ClawHub skill and blocks known malicious packages.Two tools are full architectural rewrites. IronClaw, NEAR AI’s Rust reimplementation, runs all untrusted tools inside WebAssembly sandboxes where tool code starts with zero permissions and must explicitly request network, filesystem, or API access. Credentials get injected at the host boundary and never touch agent code, with built-in leak detection scanning requests and responses. Carapace, an independent open-source project, inverts every dangerous OpenClaw default with fail-closed authentication and OS-level subprocess sandboxing.Two tools focus on scanning and auditability: Cisco's open-source scanner combines static, behavioral, and LLM semantic analysis, while NanoClaw reduces the entire codebase to roughly 500 lines of TypeScript, running each session in an isolated Docker container.O’Reilly put the supply chain failure in direct terms. “Right now, the industry basically created a brand-new executable format written in plain human language and forgot every control that should come with it,” he said. His response has been hands-on. He shipped the VirusTotal integration before skills.sh, a much larger repository, adopted a similar pattern. Koi Security’s audit validates the urgency: 341 malicious skills found in early February grew to 824 out of 10,700 on ClawHub by mid-month, with the ClawHavoc campaign planting the Atomic Stealer macOS infostealer inside skills disguised as cryptocurrency trading tools, harvesting crypto wallets, SSH credentials, and browser passwords.OpenClaw Security Defense Evaluation MatrixDimensionClawSecVirusTotal IntegrationIronClawCarapaceNanoClawCisco ScannerDiscoveryAgents onlyClawHub onlyNomDNS scanNoNoRuntime ProtectionConfig driftNoWASM sandboxOS sandbox + prompt guardContainer isolationNoSupply ChainChecksum verifySignature scanCapability grantsEd25519 signedManual audit (~500 LOC)Static + LLM + behavioralCredential IsolationNoNoWASM boundary injectionOS keychain + AES-256-GCMMount-restricted dirsNoAuditabilityDrift logsScan verdictsPermission grant logsPrometheus + audit log500 lines totalScan reportsSemantic MonitoringNoNoNoNoNoNoSource: VentureBeat analysis based on published documentation and security audits, March 2026.The capabilities spec that treats skills like executablesO’Reilly submitted a skills specification standards update to the agentskills maintainers, led primarily by Anthropic and Vercel, that is in active discussion. The proposal requires every skill to declare explicit, user-visible capabilities before execution. Think mobile app permission manifests. He noted the proposal is getting strong early feedback from the security community because it finally treats skills like the executables they are.“The other two gaps can be meaningfully hardened with better isolation primitives and runtime guardrails, but truly closing context leakage requires deep architectural changes to how untrusted multi-agent memory and prompting are handled,” O’Reilly said. “The new capabilities spec is the first real step toward solving these challenges proactively instead of bolting on band-aids later.”What to do on Monday morningAssume OpenClaw is already in your environment. The 22% shadow deployment rate is a floor. These six steps close what can be closed and document what cannot.Inventory what is running. Scan for WebSocket traffic on port 18789 and mDNS broadcasts on port 5353. Watch corporate authentication logs for new App ID registrations, OAuth consent events, and Node.js User-Agent strings. Any instance running a version before v2026.2.25 is vulnerable to the ClawJacked remote takeover flaw.Mandate isolated execution. No agent runs on a device connected to production infrastructure. Require container-based deployment with scoped credentials and explicit tool whitelists.Deploy ClawSec on every agent instance and run every ClawHub skill through VirusTotal and Cisco's open-source scanner before installation. Both are free. Treat skills as third-party executables, because that is what they are.Require human-in-the-loop approval for sensitive agent actions. OpenClaw’s exec approval settings support three modes: security, ask, and allowlist. Set sensitive tools to ask so the agent pauses and requests confirmation before executing shell commands, writing to external APIs, or modifying files outside its workspace. Any action that touches credentials, changes configurations, or sends data to an external endpoint should stop and wait for a human to approve it.Map the three surviving gaps against your risk register. Document whether your organization accepts, mitigates, or blocks each one: runtime semantic exfiltration, cross-agent context leakage, and agent-to-agent trust chains.Bring the evaluation table to your next board meeting. Frame it not as an AI experiment but as a critical bypass of your existing DLP and IAM investments. Every agentic AI platform that follows will face this same defense cycle. The framework transfers to every agent tool your team will assess for the next two years.The security stack you built for applications and endpoints catches malicious code. It does not catch an agent following a malicious instruction through a legitimate API call. That is where these three gaps live.
OpenClaw can bypass your EDR, DLP and IAM without triggering a single alert
Why This Matters
OpenClaw's ability to bypass traditional security measures like EDR, DLP, and IAM highlights significant vulnerabilities in current cybersecurity defenses. Its capacity to embed malicious instructions within legitimate processes and evade detection underscores the urgent need for advanced, behavior-based security solutions. This development poses a serious threat to enterprise data security and emphasizes the importance of proactive, comprehensive cybersecurity strategies.
Key Takeaways
- OpenClaw can bypass existing security tools without triggering alerts.
- A significant number of organizations have exposed or unapproved OpenClaw instances.
- Current defenses struggle to detect malicious behavior encoded in meaning rather than binary patterns.
Get alerts for these topics