Voxtral Mini 1.0 (3B) - 2507
Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
Learn more about Voxtral in our blog post here.
Key Features
Voxtral builds upon Ministral-3B with powerful audio understanding capabilities.
Dedicated transcription mode : Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly
: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly Long-form context : With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding
: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding Built-in Q&A and summarization : Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models
: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models Natively multilingual : Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian)
: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) Function-calling straight from voice : Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents
... continue reading