Tech News
← Back to articles

Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

read original related products more articles

Voxtral Mini 4B Realtime (Rust)

Streaming speech recognition running natively and in the browser. A pure Rust implementation of Mistral's Voxtral Mini 4B Realtime model using the Burn ML framework.

The Q4 GGUF quantized path (2.5 GB) runs entirely client-side in a browser tab via WASM + WebGPU. Try it live.

Quick Start

Native CLI

# Download model weights (~9 GB) uv run --with huggingface_hub \ hf download mistralai/Voxtral-Mini-4B-Realtime-2602 --local-dir models/voxtral # Transcribe an audio file (f32 SafeTensors path) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --model models/voxtral # Or use the Q4 quantized path (~2.5 GB) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --gguf models/voxtral-q4.gguf --tokenizer models/voxtral/tekken.json

Browser Demo

# Build WASM package wasm-pack build --target web --no-default-features --features wasm # Generate self-signed cert (WebGPU requires secure context) openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \ -keyout /tmp/voxtral-key.pem -out /tmp/voxtral-cert.pem \ -days 7 -nodes -subj " /CN=localhost " # Start dev server bun serve.mjs

Open https://localhost:8443 , accept the certificate, and click Load from Server to download the model shards. Record from your microphone or upload a WAV file to transcribe.

Hosted demo on HuggingFace Spaces if you want to skip local setup.

... continue reading