The last six months in LLMs in five minutes

2026-05-19 | original

read original get GPT-4 Model Overview → more articles

Why This Matters

The recent advancements in large language models (LLMs), particularly in coding agents, mark a significant milestone for the tech industry. These improvements mean that AI-powered coding tools are now reliable enough for daily professional use, potentially transforming software development workflows and boosting productivity for consumers and developers alike.

Key Takeaways

Coding agents have achieved a new level of reliability, making them suitable for daily work.
Reinforcement Learning from Verifiable Rewards significantly improved code quality in LLMs.
The shift from experimental to practical AI coding tools signals a major step toward mainstream adoption.

It took a little while for this to become clear, but the real news from November was that the coding agents got good.

OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.

In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.

Explore topics: openai anthropic codex reinforcement learning coding agents