Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: agent Clear Filter

Genie 3: A new frontier for world models

Given a text prompt, Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p. Towards world simulation At Google DeepMind, we have been pioneering research in simulated environments for over a decade, from training agents to master real-time strategy games to developing simulated environments for open-ended learning and robotics. This work motivated our development of world models, which are

DeepMind reveals Genie 3, a world model that could be the key to reaching AGI

Google DeepMind has revealed Genie 3, its latest foundation world model that the AI lab says presents a crucial stepping stone on the path to artificial general intelligence, or human-like intelligence. “Genie 3 is the first real-time interactive general purpose world model,” Shlomi Fruchter, a research director at DeepMind, said during a press briefing. “It goes beyond narrow world models that existed before. It’s not specific to any particular environment. It can generate both photo-realistic

In This Look Inside the New ‘Bad Batch’ Novel, the Emperor’s Name Counts for a Lot

From Rebels to Andor, we’ve met different types of people that make up the Empire’s sinister intelligence forces in the Imperial Security Bureau. We’ve seen agents like Kallus realize the extent of their role in the Empire’s evil, and agents like Dedra Meero consumed by the system they created. Now, in the latest Star Wars novel, we’re going to meet an agent learning a very difficult lesson: the long arm of Imperial law doesn’t apply to some people, whether they like it or not. That’s the troub

Inside OpenAI’s quest to make AI do anything for you

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions. Today that team, known as MathGen, is considered instrumental to OpenAI’s industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would. “We were trying t

People still use our old-fashioned Unix login servers

You're using a tool with a too-generic User-Agent You're probably reading this page because you've attempted to access some part of my blog (Wandering Thoughts) or CSpace, the wiki thing it's part of. Unfortunately whatever you're using to do so has a HTTP User-Agent header value that is too generic or otherwise excessively suspicious. Unfortunately, as of early 2025 there's a plague of high volume crawlers (apparently in part to gather data for LLM training) that behave like this. To reduce th

Cerebras Code

We are launching two new plans designed to make AI coding faster and more accessible: Cerebras Code Pro ($50/month) and Code Max ($200/month). Both plans give you access to Qwen3-Coder, the world’s leading open-weight coding model—running at speeds of up to 2,000 tokens per second, with a 131k-token context window, no proprietary IDE lock-in, and no weekly limits! Cerebras Makes Code Generation Instant Even with the best frontier models, you still end up waiting around for completions. And as

Build an AI telephony agent for inbound and outbound calls

AI Telephony Agent Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions. Installation Prerequisites Python 3.11+ VideoSDK account Twilio account (SIP trunking provider) Google API key (for Gemini AI) Setup Clone the repository git clone https://github.com/yourusername/ai-agent-telephony.git cd ai-agent-telephony Install dependencies pip install -r requirements.txt

Topics: add agent agents ai sip

Deep Agents

Using an LLM to call tools in a loop is the simplest form of an agent. This architecture, however, can yield agents that are “shallow” and fail to plan and act over longer, more complex tasks. Applications like “Deep Research”, “Manus”, and “Claude Code” have gotten around this limitation by implementing a combination of four things: a planning tool, sub agents, access to a file system, and a detailed prompt. Acknowledgements: this exploration was primarily inspired by Claude Code and reports o

You can preorder Hitman: Absolution now on the App Store

If you’re a Hitman fan, you probably already knew that Agent 47 is set to return in Hitman: Absolution, coming later this year. Now, Feral Interactive has confirmed the release date and opened preorders on the App Store. In a YouTube video published yesterday, Feral Interactive set expectations even higher, promising “the full AAA experience on the go” for the much-anticipated return of the Hitman franchise, following the release of Hitman: Blood Money – Reprisal: The video description reads:

AI-powered Cursor IDE vulnerable to prompt-injection attacks

A vulnerability that researchers call CurXecute is present in almost all versions of the AI-powered code editor Cursor, and can be exploited to execute remote code with developer privileges. The security issue is now identified as CVE-2025-54135 and can be leveraged by feeding the AI agent a malicious prompt to trigger attacker-control commands. The Cursor integrated development environment (IDE) relies on AI agents to help developers code faster and more efficiently, allowing them to connect

You’ve heard of AI ‘Deep Research’ tools…now Manus is launching ‘Wide Research’ that spins up 100+ agents to scour the web for you

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Chinese AI startup Manus, which made headlines earlier this year for its approach to a multi-agent orchestration platform for consumers and “pro”-sumers (professionals wanting to run work operations), is back with an interesting new use of its technology. While many other major rival AI providers such as OpenAI, Google, and xAI that have l

Show HN: Mcp-use – Connect any LLM to any MCP

Connect any LLM to any MCP server 🌐 MCP-Use is the open source way to connect any LLM to any MCP server and build custom MCP agents that have tool access, without using closed source or application clients. 💡 Let developers easily connect any LLM to tools like web browsing, file operations, and more. If you want to get started quickly check out mcp-use.com website to build and deploy agents with your favorite MCP servers. Visit the mcp-use docs to get started with mcp-use library For the

OpenAI's ChatGPT Agent casually clicks through "I am not a robot" verification

Maybe they should change the button to say, "I am a robot"? On Friday, OpenAI's new ChatGPT Agent, which can perform multistep tasks for users, proved it can pass through one of the Internet's most common security checkpoints by clicking Cloudflare's anti-bot verification—the same checkbox that's supposed to keep automated programs like itself at bay. ChatGPT Agent is a feature that allows OpenAI's AI assistant to control its own web browser, operating within a sandboxed environment with its o

Show HN: AgentGuard – Auto-kill AI agents before they burn through your budget

🛡️ AgentGuard 🚨 The Problem Your AI agent has a bug. It makes 1000 API calls in a loop. Your $2000 credit card gets charged. This happens to developers every week: Infinite loops in AI workflows Testing with production API keys Agents that don't know when to stop One typo = hundreds of dollars gone Existing tools only tell you after the damage is done. 💡 The Solution AgentGuard automatically kills your process before it burns through your budget. // Add 2 lines to any AI project: cons

Show HN: Open-source alternative to ChatGPT Agents for browsing

Meka Agent Meka Agent is an open-source, autonomous computer-using agent that delivers state-of-the-art browsing capabilities. The agent works and acts in the same way humans do, by purely using vision as its eyes and acting within a full computer context. It is designed as a simple, extensible, and customizable framework, allowing flexibility in the choice of models, tools, and infrastructure providers. Benchmarks The agent primarily focuses on web browsing today, and achieves state-of-the-

Show HN: State of the Art Open-source alternative to ChatGPT Agents for browsing

Meka Agent Meka Agent is an open-source, autonomous computer-using agent that delivers state-of-the-art browsing capabilities. The agent works and acts in the same way humans do, by purely using vision as its eyes and acting within a full computer context. It is designed as a simple, extensible, and customizable framework, allowing flexibility in the choice of models, tools, and infrastructure providers. Benchmarks The agent primarily focuses on web browsing today, and achieves state-of-the-

Show HN: An AI agent that learns your product and guides your users

Hey HN! My name is Christian, and I’m the co-founder of https://frigade.ai . We’ve built a powerful AI agent that automatically learns how to use any web-based product, and in turn guides users directly in the UI, automatically generates documentation, and even takes actions on a user’s behalf. Think of it as Clippy from the old MS Office. But on steroids. And actually helpful. You can see the agent and tool-calling SDK in action here: https://www.youtube.com/watch?v=UPe0t3A1Vpg How is this di

OpenAI's ChatGPT Agent Clicks "I Am Not a Robot" Button Without a Wink of Irony

Amid the launch of OpenAI's new ChatGPT Agent, Redditors found something odd: that the AI will gladly click its way through a test meant to distinguish between humans and robots — by identifying itself as the former. Spotted by Ars Technica, this hilarious — if not foreboding — occurrence was documented on the r/OpenAI subreddit, where a user posted screenshots of ChatGPT Agent "causally clicking the 'I am not a robot' button.'" As Ars notes, the screenshots were taken from inside the ChatGPT

Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production

Hi HN, we’re Abhinav, Andy, and Jeremy, and we’re building Lucidic AI ( https://dashboard.lucidic.ai ), an AI agent interpretability tool to help observe/debug AI agents. Here is a demo: https://youtu.be/Zvoh1QUMhXQ. Getting started is easy with just one line of code. You just call lai.init() in your agent code and log into the dashboard. You can see traces of each run, cumulative trends across sessions, built-in or custom evals, and grouped failure modes. Call lai.create_step() with any metad

Runloop lands $7M to power AI coding agents with cloud-based devboxes

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Runloop, a San Francisco-based infrastructure startup, has raised $7 million in seed funding to address what its founders call the “production gap” — the critical challenge of deploying AI coding agents beyond experimental prototypes into real-world enterprise environments. The funding round, led by The General Partnership with participati

How can enterprises keep systems safe as AI agents join human employees? Cyata launches with a new, dedicated solution

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now You thought generative AI was a technological tidal wave of change coming for enterprises, but the truth is — at 2.5 years since the launch of ChatGPT — the change is only getting started. A whopping 96% of IT and data executives plan to increase their use of AI agents this year alone, according to a recent survey from Cloudera covered by C

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL

🤓 Terminal-Bench-RL: Training Long-Horizon Terminal Agents with Reinforcement Learning TL;DR: I successfully built stable RL training infrastructure that scales to 32x H100 GPUs across 4 bare metal nodes for training long-horizon terminal-based coding agents. In doing so, I developed Terminal-Agent-Qwen3-32b to become the highest scoring Qwen3 agent on terminal-bench . WITHOUT training! (currently under submission): Unfortunately I am too GPU poor to train a SOTA coding agent 😅 (estimated £30

Writer launches a ‘super agent’ that actually gets sh*t done, outperforms OpenAI on key benchmarks

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Writer, the enterprise artificial intelligence company valued at $1.9 billion, launched an autonomous “super agent” Tuesday that can independently execute complex, multi-step business tasks across hundreds of software platforms — marking a significant escalation in the corporate AI arms race. The new Action Agent represents a fundamental s

Want AI agents to work together? The Linux Foundation has a plan

MR.Cole_Photographer/Getty With the rise of AI agents, AI programs that can perform tasks for you without being explicitly told how to carry out every individual step, a problem has arisen. It's an old one in tech circles: Interoperability. How do you get AI agents to work together? One answer is Cisco's AGNTCY (pronounced "agency") project. To prevent AI agency fragmentation, Cisco has donated the AGNTCY project to the Linux Foundation. The Thai project is backed by numerous industry heavywei

Show HN: Terminal-Bench-RL: Training Long-Horizon Terminal Agents with RL

🤓 Terminal-Bench-RL: Training Long-Horizon Terminal Agents with Reinforcement Learning TL;DR: I successfully built stable RL training infrastructure that scales to 32x H100 GPUs across 4 bare metal nodes for training long-horizon terminal-based coding agents. In doing so, I developed Terminal-Agent-Qwen3-32b to become the highest scoring Qwen3 agent on terminal-bench . WITHOUT training! (currently under submission): Unfortunately I am too GPU poor to train a SOTA coding agent 😅 (estimated £30

OpenAI’s ChatGPT Agent casually clicks through “I am not a robot” verification test

Maybe they should change the button to say, "I am a robot"? On Friday, OpenAI's new ChatGPT Agent, which can perform multistep tasks for users, proved it can pass through one of the Internet's most common security checkpoints by clicking Cloudflare's anti-bot verification—the same checkbox that's supposed to keep automated programs like itself at bay. ChatGPT Agent is a feature that allows OpenAI's AI assistant to control its own web browser, operating within a sandboxed environment with its o

Principles for production AI agents

Every now and then, people ask me: “I am new to agentic development, I’m building something, but I feel like I'm missing some tribal knowledge. Help me catch up!”. I’m tempted to suggest some serious stuff like multiweek courses (e.g. by HuggingFace or Berkeley), but not everyone is interested in that level of diving. So I decided to gather six simple empirical learnings that helped me a lot during app.build development. This post is somewhat inspired by Design Decisions Behind app.build, but

Six Principles for Production AI Agents

Every now and then, people ask me: “I am new to agentic development, I’m building something, but I feel like I'm missing some tribal knowledge. Help me catch up!”. I’m tempted to suggest some serious stuff like multiweek courses (e.g. by HuggingFace or Berkeley), but not everyone is interested in that level of diving. So I decided to gather six simple empirical learnings that helped me a lot during app.build development. This post is somewhat inspired by Design Decisions Behind app.build, but

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Today, we introduce two new GLM family members: GLM-4.5 and GLM-4.5-Air — our latest flagship models. GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications. Both GLM-4.5 and GLM-4.5-Air are hybrid re