Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: reasoning Clear Filter

xAI debuts a faster and more cost-effective version of Grok 4

A few months after the release of Grok 4 and an extremely problematic antisemitic meltdown of its chatbot, xAI is already trying to move on with its latest AI model. Elon Musk's xAI announced the release of Grok 4 Fast, a faster, more efficient reasoning model compared to its recent predecessor. According to xAI, Grok 4 Fast offers similar performance to Grok 4 while using 40 percent fewer thinking tokens on average. Along with faster results, xAI said Grok 4 Fast "results in a 98% reduction in

We Finally Know How Much It Cost to Train China’s Astonishing DeepSeek Model

Remember when DeepSeek briefly shook up the entire artificial intelligence industry by launching its large language model, R1, that was trained for a fraction of the money that OpenAI and other big players were pouring into their models? Thanks to a new paper published by the DeepSeek AI team in the journal Nature, we finally know what it took to train DeepSeek 1: $294,000 and 512 Nvidia H800 chips. The reason it was able to spend less, it seems, is because of the team’s use of trial-and-error-b

Luma AI's New Ray3 Video Generator Can 'Think' Before Creating

Reasoning models are not uncommon in the world of AI. Many companies have them, including OpenAI's GPT-o3 and Google's Gemini 2.5. But AI image and video company Luma AI just dropped its first AI reasoning video model, named Ray3, and it's available now. A reasoning model is a kind of AI model that uses more computing time to process requests and can go back and check its answers. Typically, reasoning models give you better responses, whether that's more detail or a lower rate of errors. For R

I got the highest score on ARC-AGI again swapping Python for English

I think ARC-AGI is still the most important benchmark we have today. It’s surprising that LLMs can win the math olympiad but struggle with simple puzzles that humans can solve easily. This highlights a core limitation of current LLMs: they struggle to reason about things they weren't trained on. They struggle to generalize. But they are getting better, fast. Last December, I got first place on ARC-AGI v1 with a score of 53.6%. A lot has changed since then. Thinking models had just come out and

Experimenting with Local LLMs on macOS

So, this blog post will be about LLMs, and everyone has opinions about that. To be upfront about it, I’m a skeptic (bordering on hater), yet I like experimenting with stuff so I download and run them locally on my Mac. And I’ll teach you how to do it too, if you’d like! Some call them fancy autocomplete, some argue that they are sentient and should have rights. The truth is somewhere in between. Yes, they perform next word prediction, but it’s so complex that there’s nontrivial emergent behavio

AI's not 'reasoning' at all - how this team debunked the industry hype

Pulse/Corbis via Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways We don't entirely know how AI works, so we ascribe magical powers to it. Claims that Gen AI can reason are a "brittle mirage." We should always be specific about what AI is doing and avoid hyperbole. Ever since artificial intelligence programs began impressing the general public, AI scholars have been making claims for the technology's deeper significance, even asserting the prospect

GLM 4.5 with Claude Code

GLM Coding Plan — designed for Claude Code users, starting at $3/month to enjoy a premium coding experience! GLM-4.5 and GLM-4.5-Air are our latest flagship models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters. Both models sh

Show HN: Entropy-Guided Loop – How to make small models reason

Logprobs Reasoning Loop with Weights & Biases Weave, an observability tool Uncertainty-Aware Generation with OpenAI's Responses API This project demonstrates a novel approach to improving AI model reasoning by leveraging token-level uncertainty metrics (logprobs) to create self-correcting generation loops. We compare this uncertainty-aware approach against traditional reasoning models to test whether explicit uncertainty handling can match or exceed the performance of dedicated reasoning archi

OpenAI to route sensitive conversations to GPT-5, introduce parental controls

OpenAI said Tuesday it plans to route sensitive conversations to reasoning models like GPT-5 and roll out parental controls within the next month — part of an ongoing response to recent safety incidents involving ChatGPT failing to detect mental distress. The new guardrails come in the aftermath of the suicide of teenager Adam Raine, who discussed self-harm and plans to end his life with ChatGPT, which even supplied him with information about specific suicide methods. Raine’s parents have filed

Vibe coding as a coding veteran: from 8-bit assembly to English-as-code

Note 1: On Tower of Hanoi Solutions and their Complexity. I chose the Tower of Hanoi puzzle (Lucas, 1883) because of its almost mythical status in computer science and discrete mathematics communities. It’s a staple in AI education and typically the first encounter with elegant doubly recursive algorithms for CS undergraduates. And, I chose the search algorithms mentioned in Section 1 because they constitute the core of the “state space search” paradigm in most AI textbooks (e.g., Chapters 3 and

Vibe Coding as a Coding Veteran. From 8-Bit Assembly to English-as-Code

Note 1: On Tower of Hanoi Solutions and their Complexity. I chose the Tower of Hanoi puzzle (Lucas, 1883) because of its almost mythical status in computer science and discrete mathematics communities. It’s a staple in AI education and typically the first encounter with elegant doubly recursive algorithms for CS undergraduates. And, I chose the search algorithms mentioned in Section 1 because they constitute the core of the “state space search” paradigm in most AI textbooks (e.g., Chapters 3 and

Contrastive Representations for Temporal Reasoning

In classical AI, perception relies on learning spatial representations, while planning—temporal reasoning over action sequences—is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both spatial and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure, due to reliance on spurious features. To address this, we introduce Contrastive Representati

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new training framework developed by researchers at Tencent AI Lab and Washington University in St. Louis enables large language models (LLMs) to improve themselves without requiring any human-labeled data. The technique, called R-Zero, uses reinforcement learning to generate its own training data from scratch, addressing one of the main b

Deep Think with Confidence

Authors: Yichao Fu, Xuewei Wang, Yuandong Tian, Jiawei Zhao Paper: https://arxiv.org/abs/2508.15260 Code: https://jiaweizzhao.github.io/deepconf TL;DR WHAT was done? The authors introduce Deep Think with Confidence (DeepConf), a test-time inference method that enhances the reasoning capabilities of Large Language Models (LLMs). Instead of treating all generated reasoning paths equally, DeepConf leverages the model's internal log-probabilities to derive localized confidence scores. It operate

Don’t sleep on Cohere: Command A Reasoning, its first reasoning model, is built for enterprise customer service and more

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now I was in more meetings than usual today so I just caught up to the fact that Cohere, the Canadian startup geared co-founded by former Transformer paper author Aidan Gomez toward making generative AI products work easily, powerfully, and securely for enterprises, has released its first reasoning large language model (LLM), Command A Reasonin

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study from Arizona State University researchers suggests that the celebrated “Chain-of-Thought” (CoT) reasoning in Large Language Models (LLMs) may be more of a “brittle mirage” than genuine intelligence. The research builds on a growing body of work questioning the depth of LLM reasoning, but it takes a unique “data distribution” len

Nvidia releases a new small, open model Nemotron-Nano-9B-v2 with toggle on/off reasoning

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Small models are having a moment. On the heels of the release of a new AI vision model small enough to fit on a smartwatch from MIT spinoff Liquid AI, and a model small enough to run on a smartphone from Google, Nvidia is joining the party today with a new small language model (SLM) of its own, Nemotron-Nano-9B-V2, which attained the highes

Is chain-of-thought AI reasoning a mirage?

Reading research papers and articles about chain-of-thought reasoning makes me frustrated. There are many interesting questions to ask about chain-of-thought: how accurately it reflects the actual process going on, why training it “from scratch” often produces chains that switch fluidly between multiple languages, and so on. However, people keep asking the least interesting question possible: whether chain-of-thought reasoning is “really” reasoning. Apple took up this question in their Illusio

Evaluating GPT5's reasoning ability using the Only Connect game show

Given the proliferation of reasoning models, we wanted to go beyond knowledge-based benchmarks to test reasoning abilities such as pattern recognition, lateral thinking, abstraction, contextual reasoning (accounting for British cultural references), and multi-step inference. In addition to reasoning, we aimed to assess how effectively models make decisions when presented with judgment calls—such as choosing between making an educated guess based on available clues or calling a function to retri

GPT-5 Under Fire: Red Teaming OpenAI's Model Reveals Surprising Weaknesses

Why We Tested GPT-5 GPT‑5 is making waves as OpenAI’s most advanced general-purpose model: faster, smarter, and more integrated across modalities. Its auto-routing architecture seamlessly switches between a quick-response model and a deeper reasoning model without requiring a separate “reasoning model” toggle. GPT‑5 itself decides whether to “think hard.” OpenAI also emphasizes GPT‑5’s enhanced internal self-validation. I t’s supposed to assess multiple reasoning paths internally and “double-

ChatGPT's GPT-5 models released: everything you need to know

After a long wait, GPT-5 is finally rolling out. It's available for free, Plus, Pro and Team users today. This means everyone gets to try GPT-5 today, but paid users get higher limits. In a blog post, OpenAI says GPT-5 is a big leap compared to previous models. OpenAI added that GPT-5 is the best coding model, and early benchmarks suggest it beats Opus 4.1 from Claude by a small margin, but real-life benchmarks are awaited. Unlike previous models, GPT-5 has built-in reasoning. It is a unifie

OpenAI launches GPT-5, nano, mini and Pro — not AGI, but capable of generating ‘software-on-demand’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs), all different-sized variants of GPT-5, the long-awaited predecessor to its GPT-4 model from March of 2023, nearly 2.5 years ago. The company is rolling out four distinct versions of the model — GPT-5, GPT-5 Mini, GPT-

Microsoft accidentally confirms GPT-5, GPT-5-Mini, GPT-5-Nano ahead of launch

OpenAI is hosting a live stream at 10AM PT to announce GPT-5, but Microsoft has already confirmed the details. In a GitHub document, which has now been taken offline, Microsoft confirmed GPT-5 is launching later today. While it was obvious, this is the first official confirmation. Microsoft also offered more details on GPT-5 models, including the base model, which is called just GPT-5. It is designed for logic and multi-step tasks. We also have GPT-5-mini, which is a lightweight version for c

For regulated industries, AWS’s neurosymbolic AI promises safe, explainable agent automation

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now AWS is banking on the fact that by bringing its Automated Reasoning Checks feature on Bedrock to general availability, it will give more enterprises and regulated industries the confidence to use and deploy more AI applications and agents. It is also hoping that introducing methods like automated reasoning, which utilizes math-based valida

Inside OpenAI’s quest to make AI do anything for you

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions. Today that team, known as MathGen, is considered instrumental to OpenAI’s industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would. “We were trying t

Deep Cogito goes big, releasing 4 new open source hybrid reasoning models with self-improving ‘intuition’

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Deep Cogito, a lesser-known AI research startup based in San Francisco, founded by ex-Googlers, today released four new open-ish large language models (LLMs) that attempt something few others do: learn how to reason more effectively over time — and get better at it on their own. The models, released as part of Cogito’s v2 family, range fro

Chinese startup Z.ai launches powerful open source GLM-4.5 model family with PowerPoint creation

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Another week in the summer of 2025 has begun, and in a continuation of the trend from last week, with it arrives more powerful Chinese open source AI models. Little-known (at least to us here in the West) Chinese startup Z.ai has introduced two new open source LLMs — GLM-4.5 and GLM-4.5-Air — casting them as go-to solutions for AI reasonin

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Today, we introduce two new GLM family members: GLM-4.5 and GLM-4.5-Air — our latest flagship models. GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications. Both GLM-4.5 and GLM-4.5-Air are hybrid re

OpenAI prepares GPT-5 for roll out

OpenAI's ChatGPT-5 could drop in the coming days, and it could be one of the best models from the Microsoft-backed startup. As The Verge's Tom Warren first reported, GPT-5 is being prepared for an August release. GPT-5 is believed to be the "unified" model, which means it combines the breakthroughs from the reasoning and multi-modal models, such as o3 and 4o respectively. ChatGPT currently has too many capable models for different tasks. While the models are powerful, it can be confusing beca

How logic can help AI models tell more truth, according to AWS

AWS distinguished scientist Byron Cook makes the case for "automated reasoning." Amazon AWS The term "reasoning" is a familiar metaphor in today's artificial intelligence (AI) technology, often used to describe the verbose outputs generated by so-called reasoning AI models such as OpenAI's o1 or DeepSeek AI's R1. Another kind of reasoning is quietly taking root in the most advanced applications, perhaps closer to actual reasoning. Also: Will AI think like humans? We're not even close - and we