This post is the culmination of over a year of research into how to properly use AI agents to write high-quality software in security-critical systems.
I will be writing this post primarily from my perspective as a software developer, protocol developer, and maintainer of security-critical software.
Over the past year I dove deep into AI agents. I have explored their limits, what they can and cannot be relied upon to do. I’ve created our own AI review tools that perform just as well as multi-billion dollar AI-review systems. I’ve maintained my own custom fork of an AI coding agent called Crush. And this post is my distillation of what I’ve learned to be the best approach if you want to create high-quality software using AI tools.
There are some people who hate AI. Indeed, many developers should hate AI, because it is an enemy to their own learning of software development. This post is not for them. This post is for the few expert developers whose skills have reached the point where they outclass any and all “frontier AI models” in their area of expertise. It is for these expert developers, who want to use AI as a method of increasing their performance without sacrificing any quality that I write this post.
Problems With Current Approaches
If you’ve used AI agents much, you know that during the course of a session the following can happen:
You can discover that your initial idea was dumb and a better one exists
Your agent might go “off the rails” and start doing something you don’t want it to do
I’ve watched videos with hundreds of thousands of views where YouTubers explain how they invented complicated systems of 12 parallel agents managed by an orchestrator, doing a billion things simultaneously. How they no longer have to involve themselves in the coding process. It’s just slop writing and reviewing slop while the YouTuber sits on a beach, goes to the bathroom, or sips coffee for no reason.
It is humanly impossible to build your own understanding of a codebase if you use such a “Vibe” approach. The AI will have gone off the rails multiple times and you will only notice it later when you actually try to use the software. This method may be perfectly OK in situations where you do not care about quality, but if you do care, a different approach is needed.
... continue reading