Skip to content
Tech News
← Back to articles

We replaced RAG with a virtual filesystem for our AI documentation assistant

read original get Obsidian Markdown Editor → more articles
Why This Matters

Replacing RAG with a virtual filesystem for the AI documentation assistant significantly enhances its ability to explore and retrieve information efficiently, mimicking how users navigate a codebase. This innovation reduces latency and infrastructure costs, making the assistant more scalable and responsive for high-volume interactions. It highlights a shift towards more cost-effective, instant-access solutions in AI-driven documentation tools, benefiting both developers and end-users.

Key Takeaways

RAG is great, until it isn't.

Our assistant could only retrieve chunks of text that matched a query. If the answer lived across multiple pages, or the user needed exact syntax that didn't land in a top-K result, it was stuck. We wanted it to explore docs the way you'd explore a codebase.

Agents are converging on filesystems as their primary interface because grep , cat , ls , and find are all an agent needs. If each doc page is a file and each section is a directory, the agent can search for exact strings, read full pages, and traverse the structure on its own. We just needed a filesystem that mirrored the live docs site.

The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.

Beyond latency, dedicated micro-VMs for reading static documentation introduced a serious infrastructure bill:

$0 $50k $100k $150k $200k 0 3 5 7 10 12 15 Average session duration (minutes) Additional annual compute cost Sandbox ChromaFs

At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM). Longer session times double that. (This is based on a purely naive approach, a true production workflow would probably have warm pools and container sharing, but the point still stands)

We needed the filesystem workflow to be instant and cheap, which meant rethinking the filesystem itself.

The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero.

Metric Sandbox ChromaFs P90 Boot Time ~46 seconds ~100 milliseconds Marginal Compute Cost ~$0.0137 per conversation ~$0 (reuses existing DB) Search Mechanism Linear disk scan (Syscalls) DB Metadata Query Infrastructure Daytona or similar providers Provisioned DB

... continue reading