Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Fast speech recognition with NVIDIA's Parakeet models in pure C++.

Built on axiom — a lightweight tensor library with automatic Metal GPU acceleration. No ONNX runtime, no Python runtime, no heavyweight dependencies. Just C++ and one tensor library that outruns PyTorch MPS.

~27ms encoder inference on Apple Silicon GPU for 10s audio (110M model) — 96x faster than CPU.

Supported Models

Model Class Size Type Description tdt-ctc-110m ParakeetTDTCTC 110M Offline English, dual CTC/TDT decoder heads tdt-600m ParakeetTDT 600M Offline Multilingual, TDT decoder eou-120m ParakeetEOU 120M Streaming English, RNNT with end-of-utterance detection nemotron-600m ParakeetNemotron 600M Streaming Multilingual, configurable latency (80ms–1120ms) sortformer Sortformer 117M Streaming Speaker diarization (up to 4 speakers)

All ASR models share the same audio pipeline: 16kHz mono WAV → 80-bin Mel spectrogram → FastConformer encoder.

Quick Start

# include < parakeet/parakeet.hpp > parakeet::Transcriber t ( " model.safetensors " , " vocab.txt " ); t.to_gpu(); // optional — Metal acceleration auto result = t.transcribe( " audio.wav " ); std::cout << result.text << std::endl;

Choose decoder at call site:

auto result = t.transcribe( " audio.wav " , parakeet::Decoder::CTC); // fast greedy auto result = t.transcribe( " audio.wav " , parakeet::Decoder::TDT); // better accuracy (default)

... continue reading