How we made our AI code reviewer stop being so noisy
I’m Paul, cofounder of cubic —an "AI-native GitHub." One of our core features is an AI code review agent that performs an initial review pass, catching bugs, anti-patterns, duplicated code, and similar issues in pull requests.
When we first released this agent back in April, the main feedback we got was straightforward: it was too noisy.
Even small PRs often ended up flooded with multiple low-value comments, nitpicks, or outright false positives. Rather than helping reviewers, it cluttered discussions and obscured genuinely valuable feedback.
An example nitpick
We decided to take a step back and thoroughly investigate why this was happening.
After three major architecture revisions and extensive offline testing, we managed to reduce false positives by 51% without sacrificing recall.
Many of these lessons turned out to be broadly useful—not just for code review agents but for designing effective AI systems in general.
1. The Face‑Palm Phase: A Single, Do‑Everything Agent
Our initial architecture was straightforward but problematic:
... continue reading