Find Related products on Amazon

Shop on Amazon

Move over math and reasoning, it's time to benchmark AI using Super Mario Bros.

Published on: 2025-11-09 12:52:00

Serving tech enthusiasts for over 25 years.TechSpot means tech analysis and advice you can trust The big picture: Benchmarking AI remains a thorny issue, with companies often accused of cherry-picking flattering results while burying less favorable ones. Instead of fixating on math and logic trials, perhaps it's time for a more unconventional test – one that challenges AI in a way humans instinctively understand: Super Mario Bros. After all, if an AI assistant can't strategically navigate past Goombas and Koopa Troopas, can we really trust it to operate in our complex world? Researchers at the Hao AI Lab at UC San Diego put several leading language models to the test in Super Mario Bros., offering a fresh perspective on AI capabilities. The experiment used an emulated version of the classic Nintendo game, integrated with a custom framework called GamingAgent, developed by the Hao Lab. This system allowed AI models to control Mario by generating Python code. To guide their actions, t ... Read full article.