AI2: Open Coding Agents

Over the past year, coding agents have transformed how developers write, test, and maintain software. These systems can debug, refactor, and even submit pull requests—fundamentally changing what software development looks like. Yet despite this progress, most coding agents share the same constraints: they're closed, expensive to train, and difficult to study or adapt to private codebases.

Ai2 Open Coding Agents change that. Today we’re releasing not just a collection of strong open coding models, but a training method that makes building your own coding agent for any codebase – for example, your personal codebase or an internal codebase at your organization – remarkably accessible for tasks including code generation, code review, debugging, maintenance, and code explanation.

Closed models haven't seen your internal code, so they don't know it—custom data pipelines, internal APIs, specific org conventions, and so on. Training on your private data teaches them, but generating synthetic training data from private codebases that works for agents has been challenging and cost-prohibitive. Our method makes it easy—reproducing the performance of the previously best open-source model costs ~$400 of compute, or up to $12,000 for performance that rivals the best industry models of the same size. This puts the full recipe within reach for labs and small teams.

Resource constraints drove us to maximize efficiency at every stage, from data quality to inference costs to model selection. The result: we match SWE-smith, a synthetic data method, at 57× lower cost and SkyRL, an open-source reinforcement learning (RL) system, at 26× lower cost.

The first release in our Open Coding Agents family is SERA (Soft-verified Efficient Repository Agents). The strongest – SERA-32B – solves 54.2% of SWE-Bench Verified problems, surpassing prior open-source state-of-the-art coding models of comparable sizes and context lengths while requiring only 40 GPU days (or fewer) to train on a cluster of 2 NVIDIA Hopper GPUs or NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. SERA models are optimized and compatible with Claude Code out of the box. With our fine-tuning method, you can specialize them to your own codebase including your full engineering stack and conventions quickly and at low cost.

We collaborated with NVIDIA to optimize SERA inference for their accelerated infrastructure, ensuring researchers and developers can get the most out of these models in production environments. Early benchmarks are promising: running in BF16 precision on 4xH100 GPUs, SERA achieves approximately 1,950 peak output tokens per second with a 16k context window. At FP8 precision, SERA reaches 3,700 peak output tokens per second—a higher throughput at almost negligible accuracy drop. On next-generation Blackwell 4xB200 systems running in NVFP4, SERA scales further to around 8,600 peak output tokens per second.

Every component of this release is open – models, Claude Code integration, and training recipes – and can be launched with a single line of code, making it easy to use even for those without LLM training experience. We're also releasing state-of-the-art training data so researchers can inspect what worked and push it further, and conduct deep science while avoiding the many stumbling blocks, dead ends, and other roadblocks typical of coding agents.

One result we're especially excited about: SERA uniquely enables adapting to private datasets like internal codebases, and we see evidence that a smaller, open model can replicate and possibly even exceed the performance of a more capable "teacher" coding agent in these setups. For example, SERA-32B can surpass its 110B parameter teacher (GLM-4.5-Air) on codebases like Django and Sympy after training on just 8,000 samples at a cost of $1,300.

Accessible open models can now inherit strong agentic behavior through a simple, reproducible pipeline—no large-scale RL infrastructure or engineering team required. Case in point, SERA was built largely by a single Ai2 researcher.

The challenge: specializing agents to your data

... continue reading