Find Related products on Amazon

Shop on Amazon

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

Published on: 2025-05-11 02:45:07

A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses. The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it’s been trained. Training on a test set could misleadingly inflate a model’s benchmark scores, making the model appear more capable than it actually is. Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models’ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company’s benchmarking practices. Reports that Maverick and Scout perform poorly on certain tasks fue ... Read full article.