Show HN: Cua-Bench – a benchmark for AI agents in GUI environments

Build, benchmark, and deploy agents that use computers

Cua is an open-source platform for building, benchmarking, and deploying agents that can use any computer, with isolated, self-hostable sandboxes (Docker, QEMU, Apple Virtualization).

vibe-photoshop.mp4

Choose Your Path

Cua - Agentic UI Automation & Code Execution

Build agents that see screens, click buttons, and complete tasks autonomously. Run isolated code execution environments for AI coding assistants like Claude Code, Codex CLI, or OpenCode.

# Requires Python 3.12 or 3.13 from computer import Computer from agent import ComputerAgent computer = Computer ( os_type = "linux" , provider_type = "cloud" ) agent = ComputerAgent ( model = "anthropic/claude-sonnet-4-5-20250929" , computer = computer ) async for result in agent . run ([{ "role" : "user" , "content" : "Open Firefox and search for Cua" }]): print ( result )

Get Started | Examples | API Reference

Cua-Bench - Benchmarks & RL Environments

... continue reading