Every popular agent framework runs LLM-generated code via subprocess or exec() . That's arbitrary code execution on your host. One prompt injection and you're done.
Some frameworks offer Docker isolation (OpenHands, AutoGen), but that requires running a Docker daemon and managing container infrastructure.
amla-sandbox is a WASM sandbox with capability enforcement. Agents can only call tools you explicitly provide, with constraints you define. Sandboxed virtual filesystem. No network. No shell escape.
uv pip install " git+https://github.com/amlalabs/amla-sandbox "
No Docker. No VM. One binary, works everywhere.
from amla_sandbox import create_sandbox_tool sandbox = create_sandbox_tool ( tools = [ stripe_api , database ]) # Agent writes one script instead of 10 tool calls (JavaScript) result = sandbox . run ( ''' const txns = await stripe.listTransactions({customer: "cus_123"}); const disputed = txns.filter(t => t.disputed); console.log(disputed[0]); ''' , language = "javascript" ) # Or with shell pipelines result = sandbox . run ( ''' tool stripe.listTransactions --customer cus_123 | jq '[.[] | select(.disputed)] | .[0]' ''' , language = "shell" )
Why this matters
Tool-calling is expensive. Every MCP call is a round trip through the model:
LLM → tool → LLM → tool → LLM → tool → ...
Ten tool calls = ten LLM invocations. Code mode collapses this:
... continue reading