Fast speech recognition with NVIDIA's Parakeet models in pure C++.
Built on axiom — a lightweight tensor library with automatic Metal GPU acceleration. No ONNX runtime, no Python runtime, no heavyweight dependencies. Just C++ and one tensor library that outruns PyTorch MPS.
~27ms encoder inference on Apple Silicon GPU for 10s audio (110M model) — 96x faster than CPU.
Supported Models
Model Class Size Type Description tdt-ctc-110m ParakeetTDTCTC 110M Offline English, dual CTC/TDT decoder heads tdt-600m ParakeetTDT 600M Offline Multilingual, TDT decoder eou-120m ParakeetEOU 120M Streaming English, RNNT with end-of-utterance detection nemotron-600m ParakeetNemotron 600M Streaming Multilingual, configurable latency (80ms–1120ms) sortformer Sortformer 117M Streaming Speaker diarization (up to 4 speakers)
All ASR models share the same audio pipeline: 16kHz mono WAV → 80-bin Mel spectrogram → FastConformer encoder.
Quick Start
# include < parakeet/parakeet.hpp > parakeet::Transcriber t ( " model.safetensors " , " vocab.txt " ); t.to_gpu(); // optional — Metal acceleration auto result = t.transcribe( " audio.wav " ); std::cout << result.text << std::endl;
Choose decoder at call site:
auto result = t.transcribe( " audio.wav " , parakeet::Decoder::CTC); // fast greedy auto result = t.transcribe( " audio.wav " , parakeet::Decoder::TDT); // better accuracy (default)
... continue reading