Learnings from building AI agents

How we made our AI code reviewer stop being so noisy

I’m Paul, cofounder of cubic —an "AI-native GitHub." One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests.

When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy.

Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false positives. Rather than helping reviewers, it cluttered discussions and obscured genuinely valuable feedback.

An example nitpick

We decided to take a step back and thoroughly investigate why this was happening.

After three major architecture revisions and extensive offline testing, we managed to reduce false positives by 51% without sacrificing recall.

Many of these lessons turned out to be broadly useful—not just for code review agents but for designing effective AI systems in general.

1. The Face‑Palm Phase: A Single, Do‑Everything Agent

Our initial architecture was straightforward but problematic:

... continue reading