Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark
Published on: 2025-05-01 02:46:22
Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.
Turns out, it’s not very competitive.
The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.
The release version of Llama 4 has been added to LMArena after it was found out they cheated, but you probably didn't see it because you have to scroll down to 32nd place which is where is ranks pic.twitter.com/A0Bxkdx4LX — ρ:ɡeσn (@pigeon__s) April 11, 2025
Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a ch
... Read full article.