Some uncomfortable truths about AI coding agents

Blog

I’ve been following the development of generative AI closely for several years now. Early on, like most people, I was absolutely blown away by what OpenAI accomplished based on a relatively niche deep learning research paper from Google and a bit of reinforcement learning from human feedback. When it worked, it was absolutely incredible, an illusion so convincing that it made you believe it could do anything. It inspired me to perform considerable experimentation on my own and to develop some proof of concept apps using these large language models. And it continues to be reliable fodder for lively debate between nearly anyone with even a basic understanding of it. I’ve tried really hard not to jump to conclusions about generative AI, one way or the other, but after much contemplation, I think I’m finally ready to render my verdict. I know, I know. You’ve been dying to find out (nearly everyone reading this right now: “Wait, who is this guy?”).

This post comes amidst the seemingly meteoric rise of the AI coding agent, which takes your favourite prone-to-hallucinations LLM and adds a feedback loop that allows for it to generate some truly impressive results. Entire companies are being built from the ground up that are all in on AI coding agents, and even established and well-regarded companies like Notion, Spotify and Stripe seem to be fully onboard – after all, why let humans labour away for ages when an AI coding agent can do it faster and cheaper than they ever could? Depending on who you ask, the AI coding agent has either made the process of manually writing code completely obsolete and worthless or it’s an affront to everything that the software development lifecycle stands for. I’ve decided to wade into that environment to say, definitively, that LLM-based AI coding agents have no place now, or ever, in generating production code for any software I build professionally. And I think you should seriously consider taking that stance, too.

Are AI coding agents powerful? Absolutely, they are. Anyone who has been paying attention and who is being honest with themselves can see that plainly. And do LLMs in general have their uses? Yes (as long as you never, ever trust what they have to say). Right now I want to focus on LLM-based AI coding agents, though. We’ll talk about where LLMs in general are useful for software engineering later, if you’re still with me.

There are four main issues contributing to the blanket ban on AI coding agents in my professional work: skill atrophy, artificially low cost, prompt injections and copyright/licensing.

Skill atrophy

The easiest to comprehend and also the squishiest of those issues is skill atrophy. It’s becoming clear that the software engineer’s job is changing dramatically. The role change has been described by some as becoming a sort of software engineering manager, where one writes little or no code oneself but instead supervises a team of AI coding agents as if they are a team of human junior software engineers. Yes, AI coding agents make mistakes, we are told, but not to worry; the intermediate and senior software engineers will use their years of experience and review every line of code the agents produce to make sure every change is up to snuff. Even if you believe that claim is valid now, I’m here to tell you that the software engineers that have been relegated to code review duty will become rusty over time. Their coding and software design skills will atrophy and they will become worse software engineers as a result. Even if they set out fully intending to provide the highest level of scrutiny to all generated code, they will gradually lose the ability to tell a good change from a bad one because they’ve stopped writing code themselves. Practice and receiving feedback from others are critical to the upkeep and advancement of one’s coding knowledge, but engineers in this position will get none of that.

In reality, though, the code review load for software engineers will gradually increase as fewer and fewer of them are expected to supervise an ever-growing number of coding agents, and they will inevitably learn to become complacent over time, out of pure necessity for their sanity. I’m a proponent of code review for finding room for improvement and to propagate understanding between software engineers, but even I often consider it a slog to do my due diligence for a large code review (just because I think it’s important doesn’t mean I think it’s fun). If it’s your full-time job to review a swarm of agents’ work, and experience tells you they are good enough 95%+ of the time, you’re not going to pay as much attention as you should and bad changes will get through. That’s true of all code reviews, but at least you can mostly trust that your human coworkers mean well and that they can learn from their mistakes. And, what’s more, you can actually walk over (or start a video call) and talk to your human coworker face to face to ask them why they implemented something the way they did. There’s no telling where the LLM got the inspiration for that tricky block of code. Go ahead and ask it; it will only make up a plausible-sounding response because it doesn’t actually know.

I’m fully aware that my views on this particular issue may turn out to be a case of Old Man Yells at Cloud. In particular, I recognize that it is reminiscent of a few decades ago when old timers complained about the proliferation of high level programming languages and insisted they would lead to a generation of programmers lacking a proper understanding of how the system behaves beneath all that syntactic sugar and automatic garbage collection. They won’t have the foundational skills necessary to design and build quality software. And, for the most part, they turned out to be wrong. Practically speaking, plenty of competent software engineers today don’t really understand how their language runtime allocates and frees up the memory they use, but that hasn’t stopped them from building useful and valuable things. At its core, the only defense I’ve got for that response is… this time feels different? Not a particularly rigorous defense, I admit, but I did warn you that this was the squishiest of the issues at hand. Also, I will point out that I have two decades of professional software engineering experience to bolster my argument, for what it’s worth.

Artificially low cost

... continue reading