GEPA optimizes LLMs without costly reinforcement learning

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers from the University of California, Berkeley, Stanford University and Databricks have introduced a new AI optimization method called GEPA that significantly outperforms traditional reinforcement learning (RL) techniques for adapting large language models (LLMs) to specialized tasks.

GEPA removes the popular paradigm of learning through thousands of trial-and-error attempts guided by simple numerical scores. Instead, it uses an LLM’s own language understanding to reflect on its performance, diagnose errors, and iteratively evolve its instructions. In addition to being more accurate than established techniques, GEPA is significantly more efficient, achieving superior results with up to 35 times fewer trial runs.

For businesses building complex AI agents and workflows, this translates directly into faster development cycles, substantially lower computational costs, and more performant, reliable applications.

The high cost of optimizing modern AI systems

Modern enterprise AI applications are rarely a single call to an LLM. They are often “compound AI systems,” complex workflows that chain multiple LLM modules, external tools such as databases or code interpreters, and custom logic to perform sophisticated tasks, including multi-step research and data analysis.

AI Scaling Hits Its Limits Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are: Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems Secure your spot to stay ahead: https://bit.ly/4mwGngO

A popular way to optimize these systems is through reinforcement learning methods, such as Group Relative Policy Optimization (GRPO), a technique employed in popular reasoning models, including DeepSeek-R1. This method treats the system as a black box; it runs a task, gets a simple success metric (a “scalar reward,” like a score of 7/10), and uses this feedback to slowly nudge the model’s parameters in the right direction.

... continue reading