Skip to content
Tech News
← Back to articles

Nobody Reviews Compiler Output

read original more articles
Why This Matters

This article highlights the need for new verification mechanisms in the era of coding agents, drawing parallels to how developers trust compilers through testing and formal methods rather than reviewing binary output. It emphasizes that relying on traditional code reviews for AI-generated code is impractical and underscores the importance of building robust trust frameworks for automated outputs, which is crucial for advancing the reliability and safety of AI-assisted development. This shift is vital for the tech industry to effectively integrate AI tools without compromising quality or security.

Key Takeaways

Philip Su's recent post argues that code reviews are not just impractical in the age of coding agents, they're headed toward being irresponsible. He's right on trend. But I think the framing of "lights-out codebases" skips over the more interesting and uncomfortable question: why does lights-out feel so scary, and what does that fear actually tell us?

The answer, I think, is hiding in something we already used once before and then promptly forgot we did: the compiler.

Think about how you relate to your compiler. You write C++, Rust, Go, and the toolchain spits out a binary. Do you open that binary and read through the assembly? Do you schedule a meeting with a colleague to review the object code before shipping?

Of course not. That would be absurd. And not because you blindly trust compilers, you don't. Compilers have bugs. Compilers have had famously catastrophic bugs. But you've constructed an entire apparatus that makes reviewing the output unnecessary: you write tests against observable behavior, you have type systems that constrain what the output can do, you have reproducible builds, you have fuzzing and sanitizers and formal verification in high-stakes domains. You trust the process, not the artifact.

We haven't built that apparatus for coding agents. And that, not the output itself, is what's actually missing.

Consider that a developer like Michael Novati landed 417 PRs in a single day in February. That's enough to argue that reviewing AI-generated code is volumetrically impossible. And it is. But I'd push the diagnosis further: the volume isn't the problem, it's a symptom that exposes the problem.

The problem is that we're still treating agent output the way we used to treat junior developer output, as something that needs a human to eyeball it before it's real. That made sense when code reviews were the primary quality gate. It makes no sense when they can't be.

The compiler analogy is clarifying here not because it tells us to trust agents blindly, but because it shows us what a mature pipeline looks like once you stop treating artifacts as things to be read and start treating them as things to be verified. We don't review compiled binaries. We run them against test suites, we check them against specifications, we instrument them in production. We moved the quality control upstream (type systems, linters, formal specs) and downstream (testing, monitoring, rollback), and eliminated the manual middle.

That's exactly the move we need to make with agents. We're not there yet. And the reason the lights-out framing makes engineers nervous isn't irrationality, it's that the upstream and downstream apparatus barely exists.

If compiler output requires no human review because of what surrounds it, then agent output will require no human review when we've built the equivalent. What does that actually mean?

... continue reading