Cerebras launches Qwen3-235B, achieving 1.5k tokens per second

World's fastest frontier AI reasoning model now available on Cerebras Inference Cloud

Delivers production-grade code generation at 30x the speed and 1/10th the cost of closed-source alternatives

Paris, July 8, 2025 – Cerebras Systemstoday announced the launch of Qwen3-235B with full 131K context support on its inference cloud platform. This milestone represents a breakthrough in AI model performance, combining frontier-level intelligence with unprecedented speed at one-tenth the cost of closed-source models, fundamentally transforming enterprise AI deployment.

Frontier Intelligence on Cerebras

Alibaba’s Qwen3-235B delivers model intelligence that rivals frontier models such as Claude 4 Sonnet, Gemini 2.5 Flash, and DeepSeek R1across a range of science, coding, and general knowledge benchmarks according to independent tests by Artificial Analysis.

Qwen3-235B uses an efficient mixture-of-experts architecture that delivers exceptional compute efficiency, enabling Cerebras to offer the model at $0.60 per million input tokens and $1.20 per million output tokens—less than one-tenth the cost of comparable closed-source models.

Cut Reasoning Time from Minutes to Seconds

Reasoning models are notoriously slow, often taking minutes to answer a simple question. By leveraging the Wafer Scale Engine, Cerebrasaccelerates Qwen3-235B to an unprecedented 1,500 tokens per second, reducing response times from 1-2 minutes to 0.6 seconds, making coding, reasoning, and deep-RAG workflows nearly instantaneous.

Based on Artificial Analysis measurements, Cerebras is the only company globally offering a frontier AI model capable of generating output at over 1,000 tokens per second, setting a new standard for real-time AI performance.

131K Context Enables Production-grade Code Generation

... continue reading