A guide to Gen AI / LLM vibecoding for expert programmers

I get it, you’re too good to vibe code. You’re a senior developer who has been doing this for 20 years and knows the system like the back of your hand. Or maybe you’re the star individual contributor who is the only person who can ever figure out how to solve the hard problems. Or maybe you’re the professor who created the entire subject of the algorithms you’re implementing. I don’t know you, but I do know that you think you’re too good to vibe code. And guess what, you’re absolutely and totally wrong. Facetious? Maybe… but I will go even further. No, you’re not too good to vibe code. In fact, you’re the only person who should be vibe coding. I would have thought this statement was crazy just a month ago because this label of “expert” coder also applies to me. Just to establish some street cred here, I am the maintainer of over 200 Github packages, totaling over 23,000 stars, am Co-PI of a CS Lab at MIT, was the founding architect at one pretty successful tech startup and am VP leading architecting Dyad in another domain. I clearly know how to program, would leave snide remarks at students and interns when their code was clearly created by LLMs, and was pretty publicly against all of this because these bots are too stupid to know what “correct” even means. But I started picking up this “vibe coding” about a month ago and I found out that it can be a really powerful tool, in the right circumstances and in the right workflow. For the record, I now have about 32 Claude agents continuously running in tmux windows that I can ssh to, so all day long I can just check via laptop or phone and keep plugging along. This was completely unheard of a month ago, but it’s here. This is the expert’s guide to vibe coding for those who are scoffing at those kids who don’t know what they are doing, but also want to start doing it correctly. A Mental Model for LLM Agents: Your Sophomore Year Student/Intern Drop the hype, I’m not here to sell you a ChatGPT so I’m not going to tell you it’s PhD level when it 100% absolutely clearly isn’t to anyone who has ever met a PhD in their life. But it is something, what is it? Think about an LLM agent as a dedicated intern, or a student who is around the proficiency of a sophomore in college. They know the basics of what programming looks like, they can copy other ideas and architectures, they know how to do things like run unit tests, and they know how to Google things. They have had their basic programming course, and probably have done a deep dive into some random subject as a higher level course, but if you quiz them on the topic enough you’ll learn they haven’t actually learned it deeply. The kid seems smart enough, you’d give them a shot. If this was a person who showed up to your office looking for work, what would you do with them? Generally you would do one of two things. First, you could sandbox their work. Now the reason you sandbox the work of a new student or intern is rather simple: it’s because you don’t know a new subject/tool well, you want to give it a try, and ehh why not let’s see what happens. If you took the sandbox route, you probably aren’t caring about the code (it’s likely to be unmaintainable and bit rot anyways if it’s not in your core repos), it’s about getting the artifact. You vibe code a bit, “that looks cool!”, throw it into a demo / LinkedIn post, and then move on. That’s the simple vibe coding you may have already tried and thought “that’s not useful enough”. It works, but that’s not what we’re here for, so no need to mention that more in the blog. The second path, in the complete opposite direction, you would integrate the student/intern into a project you know well because it makes it very easy for you to review their work: you can give clear feedback because you’ve already made the first 10 mistakes they will make, you already know how to tell them what is next, and you have the first 6 months of the project planned out for them so it’s low maintenance. This is how you would train most people that you want to stay long term, right? In the same way, this is the path to take with the LLM agents. So let’s walk through this process step by step. Major Point: Vibe coding turns everyone into a team lead, but not everyone should be a team lead Leading a team of programmers is hard! It takes time, skill, and patience. I think everyone when they are a kid thinks “I don’t want to be the worker, I want to be the boss and I just sit in my chair and tell people what to do and boom it all gets done!”. But after your second group project in college, you pretty quickly realize that if you lead a team wrong, you instead just end up doing all of the work yourself while having the expectation of 4 people. Now there’s a few reasons for this. One issue with trying to establish team programming is that you may just not know the subject well enough. If it takes you a while to understand the subject, what someone is trying to do, and what their code is about, then it’s just not worth the time to manage someone else. You need to be at a point where you can very quickly see the code, understand what’s going on, and say “I can’t merge this until you have tests for X and Y, and also show me a plot of Z so I know how it all relates”. If you cannot instantly see that kind of feedback, then you probably aren’t experienced enough to lead. You need to write a few million lines of code before it becomes automatic where you just look at code and say “don’t do that, that’ll be a performance bottleneck”, but you need that quickness to the code review in order for vibe coding to work. But also, let me say this bluntly: If you are an individual contributor who usually does not like to train interns because you find that they take more time than they are helpful, then don’t vibe code Some people tend to do better at working by directing. Other really smart people just can’t seem to get good work out of other people. It’s not an indictment, Silicon Valley created the “individual contributor” role for a reason. If you are one of those people, then vibe coding may not be for you as you will likely grow frustrated with the agents even quicker than you would a human (they somehow retain less information than even the worst intern, at least they remember your favorite lunch order). So go in with this mindset: I will have to have meetings with the agents, I will need to plan and give them tasks, and I will need to review the code. If I find this stuff to be slower than coding myself, then just stop right now. But if you do well with a team, then go on. How do we now make this team effective? What is the workflow of vibe coding correctly? If someone shows you Claude Code and you ask it to try and solve the problem that you were just working on (obviously a hard problem, because if it ends up on your desk that means someone else failed to solve it), you just poke and laugh at Claude when it fails miserably. But you would have never done that with a new intern or student (hopefully), so why do this? Again, you know how smart it actually is, cut the hype, and treat it the same way. This immediately leads to a few workflow principles: The workflow of vibe coding is the same as managing scrums or helping a student through a research thesis You have a problem, you give it to the agent, you review the results, and then you give it feedback. This is exactly how you would manage a scrum team or help a student through their thesis. You don’t just give them the problem and expect them to solve it, you give them the problem, they come back with a solution, and then you tell them what to do next. You probably already get most of your work done this way if you’re senior enough. Every professor has more students coding than themselves, and every senior developer has a total amount of code created by their team that is far greater than their own. Just think of Claude as your pack of newbies that just started. Now if you’re thinking “but it can be difficult to manage a bunch of newbies”… yeah, that means you’re actually senior enough to understand how to do this right. It’s fairly easy to send a new intern or student off on a project, and if their pay/grade depends on it getting done they will give you something back. Whether it’s any good depends on how well you chunked up the work for them and gave them an appropriate task. But one key thing is, if you had to do a meeting every 10 minutes it would drive you crazy, so don’t. Set up say 12-32 agents running on different processes, preferably sandboxed on some other compute resource (sandboxed so they can’t break the machine, but also so they can have a Github authentication that does not have core read/write privileges. This way you can tell it to have “dangerously unsafe permissions” and the worst that happens is it segfaults its own docker container and never opens a PR). Give it a full command: “try solving (an easy issue in this open source repository). Create a PR with the solution, and after an hour check the continuous integration to see if tests are passing. If tests are not passing, assess what the issue is, and if it is a quick fix make a commit to handle it, otherwise report what the core difficulty of the problem is” Don’t spend too much time setting up the calls, just pull from lists you already have and let it find “whatever is easy” Make it clear, make it easy, make it know the steps, and let it just keep cycling for a bit. How to review vibe coding: immediately throw out anything bad If you saw a student was cheating and just copy-pasted from StackOverflow but couldn’t explain what it did, you’d throw it out and tell them to try again. If your new intern didn’t reuse all of the solid code your team had written and instead rewrote some low level detail in a buggy and unmaintainable way, you’d throw it out and tell them to try again. If they wrote a function that was 500 lines long and did 10 different things, you’d throw it out and tell them to try again. You wouldn’t waste your time trying to fix it, you’d just tell them to try again. Again, treat the LLMs the same way. I see a lot of people following the mindset they see the vibe coding YouTubers making their silly games. “ChatGPT, try harder! Fix for me!”. You want to know a secret? That stuff is worse than worthless. The problem is that these LLMs are made to please you, so if you tell them to try harder, they will either start hallucinating or just start changing your tests. Don’t even give it a try. The moment you see it go off the rails, just throw it out. That problem is too hard for Claude, it’s for you now. Send a bunch of commands at 9am. At noon, check on them. You might have 10 done. 8 of them probably went off the rails, whatever, fire them. Hey two PRs worked, whoopee! Fire 10 more, come back at 3. 20 done, 4 successes and 16 failures. Fire a few more off, maybe a few clean up ones to look for missing docstrings or dig around to see if any performance regressions were introduced. At 6, see the other 4 successes and cut the other jobs. Vibe coding is useful only if you have enough problems that you’re happy that some subset being solved, not caring what in that subset is solved. 10 PRs were merged, plus whatever you were working on that day (yes, because you didn’t focus on this for most of your day!). You might think, that’s like 10/40 = 25% success rate, that’s not good. But you know what? Those were free. You just got a lot of extra stuff done that you wouldn’t have otherwise. The success rate is just a matter of how much these things give value for their cost. That’s for Sam Altman to worry about. But if you have a subscription to these LLMs, just keep burning through the tokens who cares. Don’t worry about success rate, just go for total successes. Where to apply vibe coding: code you know very well So this leads to a very counter-intuitive fact that may come out of left field, but I’m serious. Everyone’s first inclination is to throw it on some project they haven’t actually contributed to and get banned (okay, maybe it just looks like that to open source maintainers). But the real issue is that, the majority of your time will be spent doing code review. If you do this on code you don’t know well, you will have to spend a lot of time trying to understand the code and at that point, why not just write the code yourself? This is where most people seem to just stop and drop the idea of vibe coding all together. But instead… what about applying it to the code base you’re on? No, not on the hard problems you’re thinking about, but all of those little side problems? The small refactor you put off for the last 6 months? What about bisecting the Git commits to find the exact cause of the performance regression that showed up on master a week ago? Or you created a version specialized for Windows and Mac but left a “todo” over the Linux section because it’s easy but would be 4 hours of monotonous work? All of those things, if someone showed up with the code, you could review it in about 5 minutes and know whether it’s right or wrong. Give the agents that stuff! Vibe coding is not useful if you need it to solve a particular problem. You still do the hard stuff In just the same way, the best place to put trainees is in the project that you already know well because that makes it easy to review their work. It’s the “I don’t have time for you, so try this easy task” approach. You know the code, you know the problem, and you can give them a task that is easy enough that they can do it without too much help. This is the same principle here. Some Examples of Vibe Coding PRs Now let’s look at some of the examples my bot account has been putting out. Example 1: The Simple Success Story Here’s a quick and simple PR, the kind that is perfect here. If you don’t know performance Julia handling or trim, basically it’s a new feature in Julia v1.12 where Julia can now build small lean binaries. In order to do that, you need to make sure functions fully specialize, which they don’t by default as that would create a lot of extra compilation in many circumstances, but for higher order numerical solvers that is the behavior we want. So I told it to go specialize all instances of the function in the package, and I could check the PR fairly quickly and see it stuck to the goals and did it. This is then going to be followed up with new tooling that will perform static checks of trimming compatibility (still being worked out), but with just those backwards compatible minor changes things seem to work in the beta, so merge now and add those tests when we have a good system for it. 1 minute to write the query, come back later and 1 minute to review. This is exactly the kind of small targeted change these are geared towards. Most of the PRs aim to be like this. Even if you work on hard stuff, a huge chunk of your work isn’t hard stuff. There’s a lot of simple janitorial work you have to do on your code all of the time. Automate that part. Example 2: The Immediately Closed “That’s not for Claude” PR This PR came from pointing it at the fact that every once in awhile I get a test failure in the docs build for a chaotic ODE differentiation w.r.t. ergodic properties tutorial. It is a very fun topic, but generally anything with real math in it is too hard for the LLMs. And in this, yeah I could see immediately that this PR does not make sense… well it did. The NaN’s and Infs were definitely coming from a numerical issue in the least squares shadowing code, and what this pointed to was the Schur complement was being done with things like B * Diagonal(wBinv) * B’ which as a numerical analyst I can immediately see would double the condition number of the matrix, but there doesn’t seem to be an immediate solution with open source linear algebra things I could find. So closed this, sent a note over to Alan Edelman to try and figure out what the better way to do this factorization. While it didn’t solve the problem, at least I know what the problem is now. This is probably what most of the PRs become. It gives a hint of where the problem is, and then I take the reins. Example 3: Repeated Refactors Is a sweet and simple PR that refactors the tests to move some things, specifically the Enzyme automatic differentiation engine testing, to a “no pre” set. The “no pre” means “does not run on prereleases of the next language version”, since these tools touch language internals in the compiler so they are never ready early. This always make prerelease tests fail before they actually test anything meaningful, so I wanted to move all Enzyme usage to a “no pre” set in every repo it showed up. About 5 minutes to write the query. Some of the test suites needed a simple Github suggestion to fix up a little detail here or there. About 5 minutes to get this thing into 8 repos. Now I was ready to start using prerelease tests. Would’ve been at least a half hour by hand just because we didn’t have an easy system for doing this before. Maybe that’s a little dirtier than the perfect regex, but whatever 10 minutes of my time sounds like a win. Refactors generally work out really well and are one of the top uses for the tool. “make it correct, write good tests, and let it refactor” is generally a lazy way to get 90% of the way there. Example 4: The Information Gathering PR Here is a pull request that was generated by pointing it to solve this issue. That issue was mostly chosen because it was sitting on the issue list for awhile and it didn’t seem so difficult but I hadn’t had the time to track down the memory leak in a not so widely used extension for an alternative C-based sparse matrix solver, but it needed to get done some time. So, throw the bot on it. And what it comes back with is to add a memory finalizer (i.e. how to tell the GC how to remove the memory) for the other library. I could take one look at it and immediately see that kind of code should not live in this library, it should live in the library where the solver is bound to the language, and the fact that it was missing a finalizer is something that should be solved over there. Close the PR, throw out the code, find the stalled discussion on the repo that should have the finalizer, poke the author a bit, and it’s in. Done, someone just needed to be reminded. Total time on my end was about 3 minutes. The bot could have also written that fix but it basically already existed so no need, this was more about finding out where in the system something was offer. Example 5: The “How Long is that Going to Take?” PR Here is a nice PR where it didn’t finish (at least at the time of writing this) and the reason is because there are lots of other clean ups that need to happen for this to ever work. How far away is it? Well it generated a set of tests that cleanly listed out all 120 things to solve. Great, this is probably a full week’s task… I knew it would be a lot but that is now pretty concrete. I probably won’t use the bot to finish this one, but now if someone asked what the effort would be I can give them a pretty clear estimate because it has been reduced from “someone needs to give it a try, seems like a good chunk of work” to “the hard part is making these 120 things happen, which is easy but tedious and it would take about a week, probably not worth the effort right now”. That’s very useful when planning ahead. Total me time was about 5 minutes, plus the PR discussion time to explain to others what the results meant. Conclusion: Vibe Coding Done Right is actually an Expert’s Task Vibe coding turns any individual into the CTO leading a team of 20, 30, 50, 60 interns. You immediately get a massive team. It takes time and experience to actually handle a group like this correctly and to make it be productive. Making all of 60 interns not break the performance, correctness, or maintainability of your code is very difficult. So I do not recommend this to people who are still “up and coming programmers”. But if you’re a bit more senior and starting to grow your group, well this is then a cheap way to accelerate that. What that means is, vibe coding is sold for people who don’t know how to program, but if you actually think about it, the main audience that can actually use it correctly is experts. A few side remarks I didn’t get to The role of empathy in vibe coding success Some of the least empathetic people I know in open source are the ones who are also the most skeptical of vibe coding. I have a heavy speculation that they speak to the agents similarly to how they speak to other potential contributors, and drive the bots away the same way they do to people. But with a bot, it will always try to make you happy, just by hallucinating and commenting out your tests. These same people also don’t want the bots around because they claim that’s all the bots ever do. Weird coincidence. I wonder what this will do to the culture of programming over time. The cost may not make sense in the long run, but it does while the VCs are paying for it On my \$200/month Claude 20x Max subscription I used enough tokens for about $5,200 of compute in the first month. This is obviously not sustainable, but hey, it’s a startup world and VCs are paying for it right now. If you can get a few extra features done that get you more funding, then this is worth it. If you’re a professor and you can get a few more papers out, then this is worth it. If you’re an individual contributor at a big company and you can get a few more features out that make your team look good, then this is worth it. Will it be worth it after the money runs out? Who knows, but mine while the gold is there. What’s the right setup? Easy, Claude Code The tab-complete stuff is pretty annoying. The power comes from running agents. Claude Code has a simple setup and is able to start running code. Just write a decent Claude.md that tells it to stop being so nice and instead just tell me when it cannot solve the problem, and you’re good to go.

A guide to Gen AI / LLM vibecoding for expert programmers

Share this article

Related Articles