Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)
Published on: 2025-06-11 09:49:22
Cerebras Breaks the 2,500 Tokens Per Second Barrier with Llama 4 Maverick 400B
SUNNYVALE CA – May 28, 2025 -- Last week, Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta’s Llama 4 Maverick. Today, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia’s flagship solution.
“Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week,” said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. "Artificial Analysis has benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 tokens per second, compared to NVIDIA Blackwell's 1,038 tokens per second for the same model. We’ve tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta’s flagship model.”
With today’s results, Cerebras has set a world record for LLM inference speed on the 400B paramete
... Read full article.