A new paradigm for AI: How ‘thinking as optimization’ leads to better general-purpose models

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers at the University of Illinois Urbana-Champaign and the University of Virginia have developed a new model architecture that could lead to more robust AI systems with more powerful reasoning capabilities.

Called an energy-based transformer (EBT), the architecture shows a natural ability to use inference-time scaling to solve complex problems. For the enterprise, this could translate into cost-effective AI applications that can generalize to novel situations without the need for specialized fine-tuned models.

The challenge of System 2 thinking

In psychology, human thought is often divided into two modes: System 1, which is fast and intuitive, and System 2, which is slow, deliberate and analytical. Current large language models (LLMs) excel at System 1-style tasks, but the AI industry is increasingly focused on enabling System 2 thinking to tackle more complex reasoning challenges.

Reasoning models use various inference-time scaling techniques to improve their performance on difficult problems. One popular method is reinforcement learning (RL), used in models like DeepSeek-R1 and OpenAI’s “o-series” models, where the AI is rewarded for producing reasoning tokens until it reaches the correct answer. Another approach, often called best-of-n, involves generating multiple potential answers and using a verification mechanism to select the best one.

However, these methods have significant drawbacks. They are often limited to a narrow range of easily verifiable problems, like math and coding, and can degrade performance on other tasks such as creative writing. Furthermore, recent evidence suggests that RL-based approaches might not be teaching models new reasoning skills, instead just making them more likely to use successful reasoning patterns they already know. This limits their ability to solve problems that require true exploration and are beyond their training regime.

Energy-based models (EBM)

The architecture proposes a different approach based on a class of models known as energy-based models (EBMs). The core idea is simple: Instead of directly generating an answer, the model learns an “energy function” that acts as a verifier. This function takes an input (like a prompt) and a candidate prediction and assigns a value, or “energy,” to it. A low energy score indicates high compatibility, meaning the prediction is a good fit for the input, while a high energy score signifies a poor match.

Applying this to AI reasoning, the researchers propose in a paper that devs should view “thinking as an optimization procedure with respect to a learned verifier, which evaluates the compatibility (unnormalized probability) between an input and candidate prediction.” The process begins with a random prediction, which is then progressively refined by minimizing its energy score and exploring the space of possible solutions until it converges on a highly compatible answer. This approach is built on the principle that verifying a solution is often much easier than generating one from scratch.

... continue reading