Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

Voxtral Mini 4B Realtime (Rust)

Streaming speech recognition running natively and in the browser. A pure Rust implementation of Mistral's Voxtral Mini 4B Realtime model using the Burn ML framework.

The Q4 GGUF quantized path (2.5 GB) runs entirely client-side in a browser tab via WASM + WebGPU. Try it live.

Quick Start

Native CLI

# Download model weights (~9 GB) uv run --with huggingface_hub \ hf download mistralai/Voxtral-Mini-4B-Realtime-2602 --local-dir models/voxtral # Transcribe an audio file (f32 SafeTensors path) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --model models/voxtral # Or use the Q4 quantized path (~2.5 GB) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --gguf models/voxtral-q4.gguf --tokenizer models/voxtral/tekken.json

Browser Demo

# Build WASM package wasm-pack build --target web --no-default-features --features wasm # Generate self-signed cert (WebGPU requires secure context) openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \ -keyout /tmp/voxtral-key.pem -out /tmp/voxtral-cert.pem \ -days 7 -nodes -subj " /CN=localhost " # Start dev server bun serve.mjs

Open https://localhost:8443 , accept the certificate, and click Load from Server to download the model shards. Record from your microphone or upload a WAV file to transcribe.

Hosted demo on HuggingFace Spaces if you want to skip local setup.

... continue reading