Cohere Transcribe: Speech Recognition

Cohere is announcing Transcribe, a state-of-the-art automatic speech recognition (ASR) model that is open source and available today for download.

Speech is rapidly becoming a core modality for AI-enabled workloads and automations — from meeting transcription and speech analytics to real-time customer support agents.

Our objective was straightforward: push the frontier of dedicated ASR model accuracy under practical conditions. The model was trained from scratch with a deliberate focus on minimizing word error rate (WER), while keeping production readiness top-of-mind. In other words, not just a research artifact, but a system designed for everyday use.

Cohere Transcribe reflects that intent. It is available for open-source use with full infrastructure control, maintains a manageable inference footprint suitable for practical GPU and local utilization, delivers best-in-class serving efficiency, and is also available via Model Vault — Cohere’s secure, fully managed model inference platform.

Cohere Transcribe currently ranks #1 for accuracy on HuggingFace’s Open ASR Leaderboard, setting a new benchmark for real-world transcription performance.

This marks our zero-to-one in bringing high-performance speech recognition into enterprise AI workflows. Read on to learn more.

Model overview

Name cohere-transcribe-03-2026 Architecture conformer-based encoder-decoder Input audio waveform → log-Mel spectrogram Output transcribed text Model size 2B Model a large Conformer encoder extracts acoustic representations, followed by a lightweight Transformer decoder for token generation Training objective standard supervised cross-entropy on output tokens; trained from scratch Languages trained on 14 languages: European: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish

English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish AIPAC: Chinese (Mandarin), Japanese, Korean, Vietnamese

... continue reading