Tech News
← Back to articles

Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates

read original related products more articles

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers at Salesforce and the University of Southern California have developed a new technique that gives computer-use agents the ability to execute code while navigating graphical user interfaces (GUIs), that is, writing scripts while also moving a cursor and/or clicking buttons on an application, combining the best of both approaches to speed up workflows and reduce errors.

This hybrid approach allows an agent to bypass brittle and inefficient mouse clicks for tasks that can be better accomplished through coding.

The system, called CoAct-1, sets a new state-of-the-art on key agent benchmarks, outperforming other methods while requiring significantly fewer steps to accomplish complex tasks on a computer.

This upgrade can pave the way for more robust and scalable agent automation with significant potential for real-world applications.

AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems Secure your spot to stay ahead: https://bit.ly/4mwGngO

The fragility of point-and-click AI agents

Computer use agents typically rely on vision-language and vision-language-action models (VLMs or VLAs) to perceive a screen and take action, mimicking how a person uses a mouse and keyboard.

... continue reading