How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)
Published on: 2025-07-12 17:59:54
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Very small language models (SLMs) can outperform leading large language models (LLMs) in reasoning tasks, according to a new study by Shanghai AI Laboratory. The authors show that with the right tools and test-time scaling techniques, an SLM with 1 billion parameters can outperform a 405B LLM on complicated math benchmarks.
The ability to deploy SLMs in complex reasoning tasks can be very useful as enterprises are looking for new ways to use these new models in different environments and applications.
Test-time scaling explained
Test-time scaling (TTS) is the process of giving LLMs extra compute cylces during inference to improve their performance on various tasks. Leading reasoning models, such as OpenAI o1 and DeepSeek-R1, use “internal TTS,” which means they are trained to “think” slowly by generating a long string of chain-of-thought (CoT) tokens.
An a
... Read full article.