GoKawiil - Meta exec denies the company artificially boosted Llama 4’s benchmark scores

A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses. The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it’s been trained. Training on a test set could misleadingly inflate a model’s benchmark scores, making the model appear more capable than it actually is. Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models’ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company’s benchmarking practices. Reports that Maverick and Scout perform poorly on certain tasks fue ... Read full article.

Find Related products on Amazon

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

Related Articles