Why This Matters
This demo showcases a powerful browser-based tool that converts prompts into Excalidraw diagrams using advanced GPU acceleration and compression techniques. It highlights significant advancements in in-browser AI-driven diagram generation, enabling longer conversations and more complex visuals without relying on server-side processing. This development could influence future web-based AI applications, making them faster and more accessible for users with varying hardware capabilities.
Key Takeaways
- Uses GPU acceleration for faster prompt-to-diagram conversion
- Compresses data to handle longer conversations within GPU memory limits
- Requires modern browsers with WebGPU support and sufficient RAM
TurboQuant Prompt → Diagram
Describe any diagram, Gemma 4 E2B generates it as Excalidraw — entirely in your browser. Desktop Chrome 134+ only.
The LLM outputs compact code (~50 tokens) instead of raw Excalidraw JSON (~5,000 tokens). The TurboQuant algorithm (polar + QJL) compresses the KV cache ~2.4× so longer conversations fit in GPU memory. Needs WebGPU subgroups (Safari/iOS not supported yet) and ~3 GB RAM (mobile browsers cap well below this).
This demo reimplements the TurboQuant algorithm in WGSL compute shaders so it runs on the GPU at 30+ tok/s. The sibling turboquant-wasm npm package implements the same algorithm in WASM+SIMD for CPU-side vector search.