Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: agent Clear Filter

Tau² benchmark: How a prompt rewrite boosted GPT-5-mini by 22%

Now on the front page of Hacker News — join the discussion. In a recent post, we introduced the Tau² benchmark, a framework for benchmaring LLMs. Today we’re sharing a surprising discovery we made while using it: a simple prompt rewrite boosted a small model’s success rate by over 20%. This post is a deep-dive on how we found and fixed this performance bottleneck by making subtle changes to agent policies. Benchmarking LLMs with Tau² On the recent OpenAI Summer Update, we have seen that GPT-5

Amazon's new AI agent can make an ad from start to finish - how to try it

Amazon Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Amazon says its agent makes high-quality ad production a breeze. The agent aims to give SMBs better advertising tools. It's accessible via a new "chat" feature in Creative Studio. Amazon has launched a new AI agent that automates virtually every step of the advertisement-production process, from audience research to ideation to storyboarding to the production of a short video ad. Also: In 2 years, half of a

Topics: ad agent ai amazon new

Tau² Benchmark: How a Prompt Rewrite Boosted GPT-5-Mini by 22%

In a recent post, we introduced the Tau² benchmark, a framework for benchmaring LLMs. Today we’re sharing a surprising discovery we made while using it: a simple prompt rewrite boosted a small model’s success rate by over 20%. This post is a deep-dive on how we found and fixed this performance bottleneck by making subtle changes to agent policies. Benchmarking LLMs with Tau² On the recent OpenAI Summer Update, we have seen that GPT-5 model has made significant strides in agentic tasks. To vali

Deploying agentic AI? You'll probably do business with these 3 companies

Eoneren/E+ via Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Microsoft, Nvidia, and Google top the agentic AI market. Agentic AI automates problem-solving in real time. The tech will significantly impact enterprise productivity. Research And Markets' 2025 360 Quadrant analysis aims to provide insights into the global agentic AI market. The study, published Thursday, includes the market's key players, technological advancements, product innovations

Silicon Valley bets big on ‘environments’ to train AI agents

For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software applications to complete tasks for people. But take today’s consumer AI agents out for a spin, whether it’s OpenAI’s ChatGPT Agent or Perplexity’s Comet, and you’ll quickly realize how limited the technology still is. Making AI agents more robust may take a new set of techniques that the industry is still discovering. One of those techniques is carefully simulating workspaces where agents can be trained

Google's new open protocol secures AI agent transactions - and 60 companies already support it

Hassel/Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Google announces Agent Payments Protocol (AP2). AP2 helps companies carry out agent-led payments securely. The protocol is launched with the support of more than 60 organizations. The next level of AI assistance in e-commerce is agent-led AI transactions, where your agent can place orders on your behalf, saving you time as a consumer and connecting your product to buyers if you're a merchant. A

Google launches new protocol for agent-driven purchases

On Tuesday, Google announced a new open protocol for purchases initiated by AI agents — automated software programs that can shop and make decisions on behalf of users — with backing from more than 60 merchants and financial institutions. Called the Agent Payments Protocol (AP2), the system is meant to be interoperable between AI platforms, payment systems and vendors, providing a traceable paper trail for each transaction. In a post announcing the protocol, Google executives emphasized their c

In 2 years, half of all service calls will be resolved by AI - survey

PeopleImages/iStock/Getty Images Plus via Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways AI agents are boosting efficiency, cutting costs, and improving customer satisfaction. By 2027, 50% of service cases are expected to be resolved by AI. 4 out of 5 service leaders say AI agent investment is essential to meet business demands. Seventy-nine percent of service leaders say investment in AI agents is essential to meet business demands, according to

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers

Tech Trends Agent 🚀 A robust, scalable AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers This project demonstrates how to build a production-ready AI-powered web service by combining three cutting-edge, open-source technologies: FastAPI for high-performance asynchronous APIs Pydantic-AI for type-safe, schema-driven agent construction Model Context Protocol (MCP) servers as plug-and-play tools A quick glance at the UI: type a question, choose sources (Hacker News and/or W

Topics: agent api ea mcp ui

How to Use Claude Code Subagents to Parallelize Development

In my last post I talked about how I spent a week heads down using AI to work on a greenfield engineering metrics tool. As I built it, I’d often navigate the web app and spot things that needed to be fleshed out. Sometimes it was a small typo; other times it was a bigger feature that was still TODO. At one point I had Claude Code redesign the homepage to make it more lively. In doing so, it added some new functionality that didn’t fully exist yet: A “View All Insights” link that would show you

I used this ChatGPT trick to look for coupon codes - and saved 25% on my dinner tonight

Elyse Betters Picaro / ZDNET Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Free ChatGPT can surface working coupon codes, but it's hit or miss. Agent on ChatGPT Plus might improve your odds of finding valid deals. Coupon or promo codes can lead to easy savings online. It's Friday afternoon, and I'm wrapping up work. I've got my daughter's 4-year checkup with shots later, which means one thing: pizza night. That should put a smile on her face after our appointm

Building a Deep Research Agent Using MCP-Agent

Documenting my journey building a general-purpose deep research agent powered by MCP, and sharing the valuable (and sometimes painful) lessons learned along the way. Background My name is Sarmad Qadri and I'm the creator of the open source project, mcp-agent. My philosophy for agent development in 2025 can be summarized as – MCP is all you need. Or more verbosely: Connect state-of-the-art LLMs to MCP servers, and leverage simple design patterns to let them make tool calls, gather context and m

Windows-Use: an AI agent that interacts with Windows at GUI layer

Windows-Use is a powerful automation agent that interact directly with the Windows at GUI layer. It bridges the gap between AI Agents and the Windows OS to perform tasks such as opening apps, clicking buttons, typing, executing shell commands, and capturing UI state all without relying on traditional computer vision models. Enabling any LLM to perform computer automation instead of relying on specific models for it. 🛠️Installation Guide Prerequisites Python 3.12 or higher UV (or pip ) ) Win

After coding catastrophe, Replit says its new AI agent checks its own work - here's how to try it

SEAN GLADWELL/Moment via Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Replit unveiled Agent 3 on Wednesday. Code-generation is one of the few viable business use cases for AI. However, Replit recently deleted a user's entire codebase. On Wednesday, AI startup Replit released Agent 3, an autonomous code generation system designed to help non-programmers with software development projects. It's the latest in the industry-wide investment in vibe cod

4 ways machines will automate your business - and it's no hype, says Gartner

SEAN GLADWELL/Moment via Getty Images Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways Gartner's 2025 Hype Cycle for Emerging Technologies report is here. It underscores machine customers, among other new technologies. AI will play a growing role in business operations, Gartner predicts. AI will increasingly automate day-to-day decision-making for businesses in the coming years, thanks to AI and other emerging technologies, Gartner claims in a new report. Also:

Box CEO Aaron Levie on AI’s ‘era of context’

On Thursday, Box launched its developer conference Boxworks by announcing a new set of AI features, building agentic AI models into the backbone of the company’s products. It’s more product announcements than usual for the conference, reflecting the increasingly fast pace of AI development at the company: Box launched its AI studio last year, followed by a new set of data-extraction agents in February, and others for search and deep research in May. Now, the company is rolling out a new system

Topics: agent agents ai box data

Google’s former security leads raise $13M to fight email threats before they reach you

As AI is increasingly helping hackers to launch mass-scale email attacks, former Google security leaders have joined forces to build autonomous AI agents that aim to stop phishing, malware, and business email compromise threats before they ever reach user inboxes. That is the mission behind AegisAI, a new email security startup that has just emerged from stealth with $13 million in seed funding co-led by Accel and Foundation Capital. More than 90% of successful cyberattacks begin with a phishi

Adobe patches critical SessionReaper flaw in Magento eCommerce platform

Adobe is warning of a critical vulnerability (CVE-2025-54236) in its Commerce and Magento Open Source platforms that researchers call SessionReaper and describe as one of " the most severe" flaws in the history of the product. Today, the software company released a patch for the security issue that could be exploited without authentication to take control of customer accounts through the Commerce REST API. According to e-commerce security company Sansec, Adobe notified "selected Commerce custo

Where's the Fun in AI Gambling?

Michael Calore: You reported that most of the seasoned players in the online gambling space are sticking to the first model that you mentioned, which is just making an agent that does the research and can make tips for what would be a good bet, but it leaves the final step of actually placing the bet in the hand of you, the person who's putting their money on the line. But how is it going for the companies and the users that are taking the riskier step of letting the AI agent actually place the

South Korea: 'many' of its nationals detained in ICE raid on GA Hyundai facility

South Korea said Friday that it had expressed “concern and regret” to the U.S. Embassy over an immigration raid on a Hyundai facility in Georgia during which it said “many” South Korean nationals had been detained. “The economic activities of our companies investing in the U.S. and the rights and interests of our nationals must not be unfairly violated,” said Lee Jae-woong, a spokesperson for the foreign ministry of the key U.S. ally, according to the Yonhap news agency. Agents from Immigratio

Scale AI’s former CTO launches AI agent that could solve big data’s biggest problem

Isotopes AI came out of stealth on Thursday with a healthy $20 million seed round. It offers an AI agent to solve a problem that data analytics products have struggled with for decades: The people who know how to run the big data infrastructure are not the ones who actually need to use the data. With LLMs, business managers can ask questions of their data in natural language. Isotopes’ agent, Aidnn, can provide answers and draft complex planning documents, gathering data from wherever it’s sto

A PM's Guide to AI Agent Architecture

Last week, I was talking to a PM who'd in the recent months shipped their AI agent. The metrics looked great: 89% accuracy, sub-second respond times, positive user feedback in surveys. But users were abandoning the agent after their first real problem, like a user with both a billing dispute and a locked account. "Our agent could handle routine requests perfectly, but when faced with complex issues, users would try once, get frustrated, and immediately ask for a human." This pattern is observe

Company Replaces Customer Support With AI, Then Panics and Forces Engineers to Work the Phones as the AI Fails

Of all the startups that have come and gone, the personal finance company Klarna might be one of the best bellwethers for the finance industry overall. Specializing in "buy now pay later" microloans — tiny cash advances for purchases that don't need to go through a bank — Klarna hit app stores at a time when US consumer debt was climbing toward a record high. Now a giant of the personal finance landscape, the billion-dollar company recently reported a jaw dropping 17 percent default rate on its

Imagining the future of banking with agentic AI

Adapting to new and emerging technologies like agentic AI is essential for an organization’s survival, says Murli Buluswar, head of US personal banking analytics at Citi. “A company’s ability to adopt new technical capabilities and rearchitect how their firm operates is going to make the difference between the firms that succeed and those that get left behind,” says Buluswar. “Your people and your firm must recognize that how they go about their work is going to be meaningfully different.” The

Launch HN: Slashy (YC S25) – AI that connects to apps and does tasks

Hi HN! – We’re Pranjali, Dhruv and Harsha, building Slashy ( https://www.slashy.ai ). We’re building a general agent that connects to apps and can read data across them and perform actions via custom tools, semantic search, and personalized memory. Here’s a demo: https://www.youtube.com/watch?v=OeApHMHhccA While working on a previous startup, we realized we were spending more time doing busywork in apps than actually building product. We lost hundreds of hours scraping LinkedIn profiles, updati

State Department Agents Are Now Working With ICE on Immigration

As the Trump administration expands its crackdown on immigration, it’s pulling more and more agencies into the effort. The State Department’s law enforcement arm, the Diplomatic Security Service (DSS), is now working with Immigration and Customs Enforcement (ICE) on immigration. DSS agents are taking part in immigration enforcement in the US, and, according to emails viewed by WIRED, are now being asked to log time they are spending on immigration enforcement. DSS’s remit is limited in scope to

Microsoft is about to shake up its Copilot pricing for businesses

is a senior editor and author of Notepad , who has been covering all things Microsoft, PC, and tech for over 20 years. Posts from this author will be added to your daily email digest and your homepage feed. It’s no secret that Microsoft has been struggling to sell its Copilot AI assistant to businesses. The steep pricing has put many businesses off paying extra for Microsoft’s AI services, especially when OpenAI’s ChatGPT has been gaining traction in the all-important enterprise market. Micros

State Department Agents Are Now Working with ICE on Immigration

As the Trump administration expands its crackdown on immigration, it’s pulling more and more agencies into the effort. The State Department’s law enforcement arm, the Diplomatic Security Service (DSS), is now working with Immigration and Customs Enforcement (ICE) on immigration. DSS agents are taking part in immigration enforcement in the US, and, according to emails viewed by WIRED, are now being asked to log time they are spending on immigration enforcement. DSS’s remit is limited in scope to

Evaluating Agents

“Models constantly change and improve but evals persist” Look at the data No amount of evals will replace the need to look at the data, once you have a evals good coverage you’ll be able to decrease the time but it’ll be always a must to just look at the agent traces to identify possible issues or things to improve. Starting, end to end evals You must create evals for your agents, stop relying solely on manual testing. Not sure where to start? Add e2e evals, define a success criteria (

Topics: agent data e2e end evals

‘007 First Light’ Looks Like Bond’s Promising Return to Gaming

A few months ago, we got our first look at 007 First Light, the first James Bond video game in over a decade. Now that we know we’ll be playing a young 20-something Bond on the path to earning his license to kill and 00 status, developer IO Interactive has given its latest project a fuller reveal ahead of its March 2026 launch. First Light contains a blend of action and stealth gameplay with moments of opportunity for Bond (voiced by Patrick Gibson of Dexter: Original Sin) to be a sneaky secret