Dia2 is a streaming dialogue TTS model created by Nari Labs.
The model does not need the entire text to produce the audio, and can start generating as the first few words are given as input. You can condition the output on audio, enabling natural conversations in realtime.
We provide model checkpoints (1B, 2B) and inference code to accelerate research. The model only supports up to 2 minutes of generation in English.
⚠️ Quality and voices vary per generation, as the model is not fine-tuned on a specific voice. Use with prefix or fine-tune in order to obtain stable output.
Try it now on Hugging Face Spaces
Upcoming
Bonsai (JAX) implementation
Dia2 TTS Server: Real streaming support
Sori: Dia2-powered speech-to-speech engine written in Rust
Quickstart
... continue reading