Qwen3-Omni: Native Omni AI model for text, image and video

🖥️ Hugging Face Demo | 🖥️ ModelScope Demo | 💬 WeChat (微信) | 🫨 Discord | 📑 API

We release Qwen3-Omni, the natively end-to-end multilingual omni-modal foundation models. It is designed to process diverse inputs including text, images, audio, and video, while delivering real-time streaming responses in both text and natural speech. Click the video below for more information 😃

English Version

Chinese Version

News

2025.09.22: 🎉🎉🎉 We have released Qwen3-Omni. For more details, please check our blog!

Contents

Overview

Introduction

... continue reading