Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
A new framework from researchers at The University of Hong Kong (HKU) and collaborating institutions provides an open source foundation for creating robust AI agents that can operate computers. The framework, called OpenCUA, includes the tools, data, and recipes for scaling the development of computer-use agents (CUAs).
Models trained using this framework perform strongly on CUA benchmarks, outperforming existing open source models and competing closely with closed agents from leading AI labs like OpenAI and Anthropic.
The challenge of building computer-use agents
Computer-use agents are designed to autonomously complete tasks on a computer, from navigating websites to operating complex software. They can also help automate workflows in the enterprise. However, the most capable CUA systems are proprietary, with critical details about their training data, architectures, and development processes kept private.
“As the lack of transparency limits technical advancements and raises safety concerns, the research community needs truly open CUA frameworks to study their capabilities, limitations, and risks,” the researchers state in their paper.
AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems Secure your spot to stay ahead: https://bit.ly/4mwGngO
At the same time, open source efforts face their own set of hurdles. There has been no scalable infrastructure for collecting the diverse, large-scale data needed to train these agents. Existing open source datasets for graphical user interfaces (GUIs) have limited data, and many research projects provide insufficient detail about their methods, making it difficult for others to replicate their work.
... continue reading