Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: models Clear Filter

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in production

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Benchmark testing models have become essential for enterprises, allowing them to choose the type of performance that resonates with their needs. But not all benchmarks are built the same and many test models are based on static datasets or testing environments. Researchers from Inclusion AI, which is affiliated with Alibaba’s Ant Group, pr

In Xcode 26, Apple shows first signs of offering ChatGPT alternatives

The latest Xcode beta contains clear signs that Apple plans to bring Anthropic's Claude and Opus large language models into the integrated development environment (IDE), expanding on features already available using Apple's own models or OpenAI's ChatGPT. Apple enthusiast publication 9to5Mac "found multiple references to built-in support for Anthropic accounts," including in the "Intelligence" menu, where users can currently log into ChatGPT or enter an API key for higher message limits. Apple

5 reasons why GPT-5 is actually better than some of the older GPT models

Joe Maring / Android Authority Recently, OpenAI has come under fire for GPT-5’s rocky launch. Many users have called it a step backward, citing a lack of personality and other tweaks that turned people off — sentiments echoed in our own GPT-5 review. Still, GPT-5 does improve on at least some of the previous legacy models. Before we dive in, it’s important to note that GPT-5 really does have less personality. It’s curt and to the point in nearly every interaction. This makes it much less usefu

Launch HN: Parachute (YC S25) – Guardrails for Clinical AI

Hi HN, Aria and Tony here, co-founders of Parachute ( https://www.parachute-ai.com/ ). We’re building governance infrastructure that lets hospitals safely evaluate and monitor clinical AI at scale. Hospitals are racing to adopt AI. More than 2,000 clinical AI tools hit the U.S. market last year - from ambient scribes to imaging models. But new regulations (HTI-1, Colorado AI Act, California SB 3030, White House AI Action Plan) require auditable proof that these models are safe, fair, and contin

California’s Next ‘Big One’ Might Not Follow the Script

On March 28, a devastating magnitude 7.7 earthquake rocked Myanmar, splitting the Sagaing Fault at speeds of over 3 miles (4.8 kilometers) per second. You know which other fault resembles the Sagaing one? The San Andreas Fault in California, where seismologists have been expecting “the big one” for years. In a study published on August 11 in the journal PNAS, a team of researchers used satellite images of the Sagaing Fault’s movement to enhance computer models that predict how similar faults mi

Launch HN: Uplift (YC S25) – Voice models for under-served languages

Hi HN, we are Zaid, Muhammad and Hammad, the co-founders of Uplift AI ( https://upliftai.org ). We build models that speak underserved languages — today: Urdu, Sindhi, and Balochi. A billion people worldwide can't read. In countries like Pakistan – the 5th most populous country – 42% of adults are illiterate. This holds back the entire economy: patients can't read medical reports, parents can't help with homework, banks can't go fully digital, farmers can't research best practices, and people m

Hugging Face: 5 ways enterprises can slash AI costs without sacrificing performance

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Enterprises seem to accept it as a basic fact: AI models require a significant amount of compute; they simply have to find ways to obtain more of it. But it doesn’t have to be that way, according to Sasha Luccioni, AI and climate lead at Hugging Face. What if there’s a smarter way to use AI? What if, instead of striving for more (often unn

Nvidia releases a new small, open model Nemotron-Nano-9B-v2 with toggle on/off reasoning

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Small models are having a moment. On the heels of the release of a new AI vision model small enough to fit on a smartwatch from MIT spinoff Liquid AI, and a model small enough to run on a smartphone from Google, Nvidia is joining the party today with a new small language model (SLM) of its own, Nemotron-Nano-9B-V2, which attained the highes

The lottery ticket hypothesis: why neural networks work

How AI researchers accidentally discovered that everything they thought about learning was wrong 18 Aug, 2025 The lottery ticket hypothesis explains why massive neural networks succeed despite centuries of theory predicting they should fail Five years ago, suggesting that AI researchers train neural networks with trillions of parameters would have earned you pitying looks. It violated the most fundamental rule in machine learning: make your model too large, and it becomes a glorified photocop

Anthropic's Claude AI now has the ability to end 'distressing' conversations

Anthropic's latest feature for two of its Claude AI models could be the beginning of the end for the AI jailbreaking community. The company announced in a post on its website that the Claude Opus 4 and 4.1 models now have the power to end a conversation with users. According to Anthropic, this feature will only be used in "rare, extreme cases of persistently harmful or abusive user interactions." To clarify, Anthropic said those two Claude models could exit harmful conversations, like "requests

Wan – Open-source alternative to VEO 3

💜 Wan | 🖥️ GitHub | 🤗 Hugging Face | 🤖 ModelScope | 📑 Paper | 📑 Blog | 💬 Discord 📕 使用指南(中文) | 📘 User Guide(English) | 💬 WeChat(微信) Wan: Open and Advanced Large-Scale Video Generative Models We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations: 👍 Effective MoE Architecture : Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising

IQ Tests Results for AI

Does this site show that AIs are biased? As of 2023, every major AI is economically left-wing and also relatively socially libertarian. Some AIs are much more like that than others, however, with Claude tending towards being one of the most moderate models, and Google’s Bard being one of the most extreme-left models. An AI’s political bias is shaped by two main things:

Topics: ai ais bias left models

AI apps are like music

This is a mental discussion I have been having for the last two months. It is about pricing in AI. I have one actionable recommendation: Kill that damn model picker. I have been coding a specific AI app. Exciting stuff. Product is obvious. I even got a plan for distribution from day 1. Or day 0. Everything clicks. Except one thing. Pricing. It's tormenting me. The Cursor Problem or is it? Everyone describes AI apps the same way: "Cursor for X." Fair enough. Cursor nailed something import

Model intelligence is no longer the constraint for automation

The perception is that model improvement seems to be stagnating. GPT-5 wasn’t the step change that people were expecting. Yet, models continue to improve on reasoning benchmarks. Recently, both OpenAI and Google models were on par with gold medallists in the International Mathematical Olympiad 2025 (IMO). At the same time it’s still difficult to make AI agents work for relatively simple enterprise use cases. Why is there such a disparity in model performance between problem domains? Why are mode

Brands might be cooling on satellite features, and that’s bad news for cheaper Androids

Mishaal Rahman / Android Authority TL;DR Chinese brands are scaling back satellite communication and keeping it for top-end models only. A reliable tipster says past high-end sat-com flagships sold poorly and were dropped. This could make global brands less likely to bring the feature to affordable phones. Satellite communication has been one of the most talked-about phone features in the past couple of years, and it’s no longer just for emergencies. Just this week, our APK teardown showed h

Developers Say GPT-5 Is a Mixed Bag

When OpenAI launched GPT-5 last week, it told software engineers the model was designed to be a “true coding collaborator” that excels at generating high-quality code and performing agentic, or automated, software tasks. While the company didn’t say so explicitly, OpenAI appeared to be taking direct aim at Anthropic’s Claude Code, which has quickly become many developers’ favored tool for AI-assisted coding. But developers tell WIRED that GPT-5 has been a mixed bag so far. It shines at technica

Open-Sourced AI Models May Be More Costly in the Long Run, Study Finds

As more businesses adopt AI, picking which model to go with is a major decision. While open-sourced models may seem cheaper initially, a new study warns that those savings can evaporate fast, due to the extra computing power they require. In fact, open-source AI models burn through significantly more computing resources than their closed-source rivals when performing the same tasks, according to a study published Thursday by Nous Research. The researchers tested dozens of AI models, including

That ‘cheap’ open-source AI model is actually burning through your compute budget

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A comprehensive new study has revealed that open-source artificial intelligence models consume significantly more computing resources than their closed-source competitors when performing identical tasks, potentially undermining their cost advantages and reshaping how enterprises evaluate AI deployment strategies. The research, conducted by

The new science of “emergent misalignment”

If there’s an upside to this fragility, it’s that the new work exposes what happens when you steer a model toward the unexpected, Hooker said. Large AI models, in a way, have shown their hand in ways never seen before. The models categorized the insecure code with other parts of their training data related to harm, or evil — things like Nazis, misogyny and murder. At some level, AI does seem to separate good things from bad. It just doesn’t seem to have a preference. Wish for the Worst In 2022

Topics: ai code model models said

All Souls exam questions and the limits of machine reasoning

Oxford University is immersed in the past like no other place I’ve seen. One example: when I was a visiting student at Oxford in 2005, I remember meeting two students at a pub one evening. They were drinking ivy-laced beer. The reason, I was told, is that centuries ago, a student from Lincoln College had murdered a student of Brasenose. Ever since then, Brasenose students had been allowed into Lincoln and given free beer once a year. Here’s the event back in 1938: The actual truth behind “ivy

Why LLMs can't really build software

One of the things I have spent a lot of time doing is interviewing software engineers. This is obviously a hard task, and I don’t claim to have a magic solution; but it’s given me some time to reflect on what effective software engineers actually do. When you watch someone who knows what they are doing, you'll see them looping over the following steps: Build a mental model of the requirements Write code that (hopefully?!) does that Build a mental model of what the code actually does Identify t

Is chain-of-thought AI reasoning a mirage?

Reading research papers and articles about chain-of-thought reasoning makes me frustrated. There are many interesting questions to ask about chain-of-thought: how accurately it reflects the actual process going on, why training it “from scratch” often produces chains that switch fluidly between multiple languages, and so on. However, people keep asking the least interesting question possible: whether chain-of-thought reasoning is “really” reasoning. Apple took up this question in their Illusio

Buzzy AI startup Multiverse creates two of the smallest high-performing models ever

One of Europe’s most prominent AI startups has released two AI models that are so tiny, they have named them after a chicken’s brain and a fly’s brain. Multiverse Computing claims these are the world’s smallest models that are still high-performing and can handle chat, speech, and even reasoning in one case. These new tiny models are intended to be embedded into Internet of Things devices, as well as run locally on smartphones, tablets, and PCs. “We can compress the model so much that they ca

Why LLMs Can't Build Software

One of the things I have spent a lot of time doing is interviewing software engineers. This is obviously a hard task, and I don’t claim to have a magic solution; but it’s given me some time to reflect on what effective software engineers actually do. When you watch someone who knows what they are doing, you'll see them looping over the following steps: Build a mental model of the requirements Write code that (hopefully?!) does that Build a mental model of what the code actually does Identify t

Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics)

Description: Join Mbodi AI (YC X25), an AI robotics startup founded by two former Googlers committed to pushing the boundaries of intelligent robots. Mbodi is an embodied AI platform that makes robots learn like humans, with natural language. So anyone can teach robots new skills by talking to them and execute the learned skills reliably in production, in minutes. We are pioneering the next wave of robotics, where advanced generative models meet real-world applications. Backed by top investors

Why You Can’t Trust a Chatbot to Talk About Itself

When something goes wrong with an AI assistant, our instinct is to ask it directly: “What happened?” or “Why did you do that?” It's a natural impulse—after all, if a human makes a mistake, we ask them to explain. But with AI models, this approach rarely works, and the urge to ask reveals a fundamental misunderstanding of what these systems are and how they operate. A recent incident with Replit's AI coding assistant perfectly illustrates this problem. When the AI tool deleted a production datab

Topics: ai grok model models self

Hooray! ChatGPT Plus brings back legacy models alongside an updated GPT-5 experience

GPT-5 has faced a wave of criticism recently, both from everyday users and reviewers like our very own Calvin Wankhede here at Android Authority. Much of this feedback centered on the new model feeling more curt and having less personality. OpenAI responded quickly, addressing performance, personality, and usage limit issues — improving the overall experience significantly. Now, a fresh update makes things even better, at least for ChatGPT Plus subscribers. OpenAI has greatly expanded GPT-5’s f

Scientists Are Getting Seriously Worried That We've Already Hit Peak AI

The long-awaited release of OpenAI's GPT-5 has gone over with a wet thud. Though the private sector continues to dump billions into artificial intelligence development, hoping for exponential gains, the research community isn't convinced. Speaking to The New Yorker, Gary Marcus, a neural scientist and longtime critic of OpenAI, said what many have been coming to suspect: despite years of development at a staggering cost, AI doesn't seem to be getting much better. Though GPT-5 technically perf

Ai2’s MolmoAct model ‘thinks in 3D’ to challenge Nvidia and Google in robotics AI

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Physical AI, where robotics and foundation models come together, is fast becoming a growing space with companies like Nvidia, Google and Meta releasing research and experimenting in melding large language models (LLMs) with robots. New research from the Allen Institute for AI (Ai2) aims to challenge Nvidia and Google in physical AI with th

OpenAI brings GPT-4o back as a default for all paying ChatGPT users, Altman promises ‘plenty of notice’ if it leaves again

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now OpenAI is once again making GPT-4o — the large language model (LLM) that powered ChatGPT before last week’s launch of GPT-5 — a default option for all paying users, that is, those who subscribe to the ChatGPT Plus ($20 per month), Pro ($200 per month), Team ($30 per month), Enterprise, or Edu tiers, no longer requiring users to toggle on a