Evaluating publicly available LLMs on IMO 2025
Introduction Recent progress in the mathematical capabilities of LLMs have created a need for increasingly challenging benchmarks. With MathArena, we address this need by evaluating models on difficult and recent mathematical competitions, offering benchmarks that are both uncontaminated and interpretable. Among these competitions, the International Mathematical Olympiad (IMO) stands out as the most well-known and prestigious. As such, an evaluation of the IMO 2025, which took place just a few