Authors: Yichao Fu, Xuewei Wang, Yuandong Tian, Jiawei Zhao
Paper: https://arxiv.org/abs/2508.15260
Code: https://jiaweizzhao.github.io/deepconf
TL;DR
WHAT was done? The authors introduce Deep Think with Confidence (DeepConf), a test-time inference method that enhances the reasoning capabilities of Large Language Models (LLMs). Instead of treating all generated reasoning paths equally, DeepConf leverages the model's internal log-probabilities to derive localized confidence scores. It operates in two modes: an offline mode that filters completed reasoning traces and applies confidence-weighted majority voting, and a novel online mode that dynamically terminates the generation of low-confidence traces mid-stream. This is achieved without any additional model training or complex hyperparameter tuning.
WHY it matters? This work addresses the critical challenge of high computational cost and diminishing returns in popular test-time scaling methods like self-consistency. By intelligently filtering out low-quality reasoning and enabling early stopping, DeepConf achieves state-of-the-art accuracy (e.g., 99.9% on the AIME 2025 benchmark with GPT-OSS-120B) while dramatically reducing the number of generated tokens—by up to 84.7%. This makes high-performance LLM reasoning more efficient, scalable, and economically viable, paving the way for more practical deployment in real-world applications.
Details
The Challenge: Brute-Force Reasoning is Inefficient
Large Language Models (LLMs) have shown remarkable progress in complex reasoning tasks, largely thanks to test-time scaling techniques like self-consistency (https://arxiv.org/abs/2203.11171, more on this and related techniques here). The principle is simple: generate multiple reasoning paths ("thoughts") for a single problem and take the majority answer. While effective, this brute-force strategy has significant drawbacks—it's computationally expensive, leads to diminishing returns, and treats every thought as equally valid. This challenge has spurred a wave of research into more efficient reasoning. While methods like Early-Stopping Self-Consistency (https://arxiv.org/abs/2401.10480) also terminate generation early, DeepConf's distinction lies in its use of localized, internal confidence signals rather than just answer convergence.
This paper introduces Deep Think with Confidence (DeepConf), a method that shifts the paradigm from "thinking more" to "thinking smarter." It equips LLMs with a form of introspection, allowing them to assess the quality of their own reasoning paths and discard the unpromising ones.
... continue reading