Evaluating GPT5's reasoning ability using the Only Connect game show
Given the proliferation of reasoning models, we wanted to go beyond knowledge-based benchmarks to test reasoning abilities such as pattern recognition, lateral thinking, abstraction, contextual reasoning (accounting for British cultural references), and multi-step inference. In addition to reasoning, we aimed to assess how effectively models make decisions when presented with judgment calls—such as choosing between making an educated guess based on available clues or calling a function to retri