Voxtral Mini 4B Realtime (Rust)
Streaming speech recognition running natively and in the browser. A pure Rust implementation of Mistral's Voxtral Mini 4B Realtime model using the Burn ML framework.
The Q4 GGUF quantized path (2.5 GB) runs entirely client-side in a browser tab via WASM + WebGPU. Try it live.
Quick Start
Native CLI
# Download model weights (~9 GB) uv run --with huggingface_hub \ hf download mistralai/Voxtral-Mini-4B-Realtime-2602 --local-dir models/voxtral # Transcribe an audio file (f32 SafeTensors path) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --model models/voxtral # Or use the Q4 quantized path (~2.5 GB) cargo run --release --features " wgpu,cli,hub " --bin voxtral-transcribe -- \ --audio audio.wav --gguf models/voxtral-q4.gguf --tokenizer models/voxtral/tekken.json
Browser Demo
# Build WASM package wasm-pack build --target web --no-default-features --features wasm # Generate self-signed cert (WebGPU requires secure context) openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \ -keyout /tmp/voxtral-key.pem -out /tmp/voxtral-cert.pem \ -days 7 -nodes -subj " /CN=localhost " # Start dev server bun serve.mjs
Open https://localhost:8443 , accept the certificate, and click Load from Server to download the model shards. Record from your microphone or upload a WAV file to transcribe.
Hosted demo on HuggingFace Spaces if you want to skip local setup.
... continue reading