Skip to content
Tech News
← Back to articles

Computer Use is 45x more expensive than structured APIs

read original get API Integration Toolkit → more articles
Why This Matters

This study highlights that using traditional computer interfaces for AI agents is significantly more costly—up to 45 times—than utilizing structured APIs. By demonstrating the efficiency and cost savings of API-driven approaches, it underscores the importance of developing standardized API surfaces for internal tools to enable more scalable and economical AI automation. This shift could accelerate AI adoption across industries by reducing operational costs and engineering overhead.

Key Takeaways

We ran a benchmark comparing two ways of letting an AI agent operate the same admin panel, with the goal of putting a price tag on vision agents (browser-use, computer-use).

Here is what we measured, what we had to change to make the vision agent work at all, and what changes when generating an API surface stops being a separate engineering project.

Vision agents are the default for letting AI agents operate web apps that don't expose APIs. The alternative, writing an MCP or REST surface per app, is its own engineering project across the 20+ internal tools most teams have. Most teams default to vision agents not because they are better, but because the alternative is too expensive to build. The cost of the vision approach is treated as a fixed price.

We wanted to measure the price.

The test app is an admin panel for managing customers, orders, and reviews, modeled on the react-admin Posters Galore demo. Two agents target the same running app: one drives the UI via screenshots and clicks, the other calls the app's HTTP endpoints directly. Same Claude Sonnet, same pinned dataset, same task. The interface is the only variable.

The task: find the customer named "Smith" with the most orders, locate their most recent pending order, accept all of their pending reviews, and mark the order as delivered. This touches three resources, requires filtering, pagination, cross-entity lookups, and both reads and writes. It is the shape of work a typical internal tool sees daily.

Path A: Vision agent. Claude Sonnet driving the UI via browser-use 0.12. Vision mode, taking screenshots and executing clicks.

Path B: API agent. Claude Sonnet with tool-use, calling the handlers the UI calls. Each tool maps to one or more event handlers on the app's State, the same functions a button click would trigger. The agent gets the structured response back instead of a rendered page.

All code is open source.

We started by giving both agents the same six-sentence task above and seeing what happened.

... continue reading