Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
Published on: 2025-08-03 16:02:04
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Meta announced today a partnership with Cerebras Systems to power its new Llama API, offering developers access to inference speeds up to 18 times faster than traditional GPU-based solutions.
The announcement, made at Meta’s inaugural LlamaCon developer conference in Menlo Park, positions the company to compete directly with OpenAI, Anthropic, and Google in the rapidly growing AI inference service market, where developers purchase tokens by the billions to power their applications.
“Meta has selected Cerebras to collaborate to deliver the ultra-fast inference that they need to serve developers through their new Llama API,” said Julie Shin Choi, chief marketing officer at Cerebras, during a press briefing. “We at Cerebras are really, really excited to announce our first CSP hyperscaler partnership to deliver ultra-fast inference to all developers.”
The partn
... Read full article.