Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: reasoning Clear Filter

OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Scientists from OpenAI, Google DeepMind, Anthropic and Meta have abandoned their fierce corporate rivalry to issue a joint warning about artificial intelligence safety. More than 40 researchers across these competing companies published a research paper today arguing that a brief window to monitor AI reasoning could close forever — and soon

How to scale RL to 10^26 FLOPs

TLDR: Reinforcement learning (RL) is the next training technique for building frontier-level AI models. To make it better, we need to train on more data. The current approach of scaling many environments simultaneously is messy and complicated. Instead, I propose we find a way to do next-token prediction on the Web using RL. This way, we learn to reason from general web data, instead of just math and code. I’ve spent a good part of the past year in denial. I was in denial because when OpenAI r

Smollm3: Smol, multilingual, long-context reasoner LLM

SmolLM3: smol, multilingual, long-context reasoner Published July 8, 2025 Update on GitHub Base model: https://hf.co/HuggingFaceTB/SmolLM3-3B-Base Instruct and reasoning model: https://hf.co/HuggingFaceTB/SmolLM3-3B Small language models are becoming increasingly important as users seek capable models that can be deployed efficiently. The community has produced a fascinating range of capable small models, each pushing the boundaries of what's possible at this scale. With SmolLM3, we're excit

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths

This work investigates how large reasoning models internally track their thinking progress and how such processes can be monitored and controlled. We focus on reasoning models that explicitly segment their computations using <think> and </think> tokens (e.g., DeepSeek-R1), allowing us to study the internal dynamics of the "thinking phase." 1. Monitoring the Thinking Phase We hypothesize that hidden states encode a token's relative position within the thinking phase. To test this, we collect hi

Meta hires key OpenAI researcher to work on AI reasoning models

Meta has hired a highly influential OpenAI researcher, Trapit Bansal, to work on its AI reasoning models under the company’s new AI superintelligence unit, a person familiar with the matter tells TechCrunch. OpenAI spokesperson Kayla Wood confirmed to TechCrunch that Bansal had departed OpenAI. Bansal’s LinkedIn page says that he left OpenAI in June. Bansal has worked at OpenAI since 2022 and was a key player in kickstarting the company’s work on reinforcement learning alongside co-founder Ily

Learnings from building AI agents

How we made our AI code reviewer stop being so noisy I’m Paul, cofounder of cubic —an "AI-native GitHub." One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests. When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy. Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false posi

Learnings from Building AI Agents

How we made our AI code reviewer stop being so noisy I’m Paul, cofounder of cubic —an "AI-native GitHub." One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests. When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy. Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false posi

Google’s Gemini transparency cut leaves enterprise developers ‘debugging blind’

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google‘s recent decision to hide the raw reasoning tokens of its flagship model, Gemini 2.5 Pro, has sparked a fierce backlash from developers who have been relying on that transparency to build and debug applications. The change, which echoes a similar move by OpenAI, replaces the model’s step-by-step reasoning with a simplified summary.

Why Some AI Models Spew 50 Times More Greenhouse Gas to Answer the Same Question

Like it or not, large language models have quickly become embedded into our lives. And due to their intense energy and water needs, they might also be causing us to spiral even faster into climate chaos. Some LLMs, though, might be releasing more planet-warming pollution than others, a new study finds. Queries made to some models generate up to 50 times more carbon emissions than others, according to a new study published in Frontiers in Communication. Unfortunately, and perhaps unsurprisingly,

What Apple's controversial research paper really tells us about LLMs

CHRISTOPH BURGSTEDT/SCIENCE PHOTO LIBRARY/Getty Generative AI models quickly proved they were capable of performing technical tasks well. Adding reasoning capabilities to the models unlocked unforeseen capabilities, enabling the models to think through more complex questions and produce better-quality, more accurate responses -- or so we thought. Last week, Apple released a research report called "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the

Do reasoning AI models really ‘think’ or not? Apple research sparks lively debate, response

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Apple’s machine-learning group set off a rhetorical firestorm earlier this month with its release of “The Illusion of Thinking,” a 53-page research paper arguing that so-called large reasoning models (LRMs) or reasoning large language models (reasoning LLMs) such as OpenAI’s “o” series and Google’s Gemini-2.5 Pro and Flash Thinking don’t a

Do reasoning models really “think” or not? Apple research sparks lively debate, response

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Apple’s machine-learning group set off a rhetorical firestorm earlier this month with its release of “The Illusion of Thinking,” a 53-page research paper arguing that so-called large reasoning models (LRMs) or reasoning large language models (reasoning LLMs) such as OpenAI’s “o” series and Google’s Gemini-2.5 Pro and Flash Thinking don’t a

New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

Apple’s recent AI research paper, “The Illusion of Thinking”, has been making waves for its blunt conclusion: even the most advanced Large Reasoning Models (LRMs) collapse on complex tasks. But not everyone agrees with that framing. Today, Alex Lawsen, a researcher at Open Philanthropy, published a detailed rebuttal arguing that many of Apple’s most headline-grabbing findings boil down to experimental design flaws, not fundamental reasoning limits. The paper also credits Anthropic’s Claude Opus

AI flunks logic test: Multiple studies reveal illusion of reasoning

Bottom line: More and more AI companies say their models can reason. Two recent studies say otherwise. When asked to show their logic, most models flub the task – proving they're not reasoning so much as rehashing patterns. The result: confident answers, but not intelligent ones. Apple researchers have uncovered a key weakness in today's most hyped AI systems – they falter at solving puzzles that require step-by-step reasoning. In a new paper, the team tested several leading models on the Tower

With the launch of o3-pro, let’s talk about what AI “reasoning” actually does

On Tuesday, OpenAI announced that o3-pro, a new version of its most capable simulated reasoning model, is now available to ChatGPT Pro and Team users, replacing o1-pro in the model picker. The company also reduced API pricing for o3-pro by 87 percent compared to o1-pro while cutting o3 prices by 80 percent. While "reasoning" is useful for some analytical tasks, new studies have posed fundamental questions about what the word actually means when applied to these AI systems. We'll take a deeper l

New Apple study challenges whether AI models truly “reason” through problems

In early June, Apple researchers released a study suggesting that simulated reasoning (SR) models, such as OpenAI's o1 and o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking, produce outputs consistent with pattern-matching from training data when faced with novel problems requiring systematic thinking. The researchers found similar results to a recent study by the United States of America Mathematical Olympiad (USAMO) in April, showing that these same models achieved low scores on novel mathematic

Together AI’s $305M bet: Reasoning models like DeepSeek-R1 are increasing, not decreasing, GPU demand

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More When DeepSeek-R1 first emerged, the prevailing fear that shook the industry was that advanced reasoning could be achieved with less infrastructure. As it turns out, that’s not necessarily the case. At least, according to Together AI, the rise of DeepSeek and open-source reasoning has had the exact opposite effect: Instead of reducing the need for infrastructure, it is

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Very small language models (SLMs) can outperform leading large language models (LLMs) in reasoning tasks, according to a new study by Shanghai AI Laboratory. The authors show that with the right tools and test-time scaling techniques, an SLM with 1 billion parameters can outperform a 405B LLM on complicated math benchmarks. The ability to deploy SLMs in complex reasoni