Vibecoding #2

I feel like I got substantial value out of Claude today, and want to document it. I am at the tail end of AI adoption, so I don’t expect to say anything particularly useful or novel. However, I am constantly complaining about the lack of boring AI posts, so it’s only proper if I write one.

Problem Statement At TigerBeetle, we are big on deterministic simulation testing. We even use it to track performance, to some degree. Still, it is crucial to verify performance numbers on a real cluster in its natural high-altitude habitat. To do that, you need to procure six machines in a cloud, get your custom version of tigerbeetle binary on them, connect cluster’s replicas together and hit them with load. It feels like, quarter of a century into the third millennium, “run stuff on six machines” should be a problem just a notch harder than opening a terminal and typing ls , but I personally don’t know how to solve it without wasting a day. So, I spent a day vibecoding my own square wheel. The general shape of the problem is that I want to spin a fleet of ephemeral machines with given specs on demand and run ad-hoc commands in a SIMD fashion on them. I don’t want to manually type slightly different commands into a six-way terminal split, but I also do want to be able to ssh into a specific box and poke it around.

Solution My idea for the solution comes from these three sources: https://github.com/catern/rsyscall

https://peter.bourgon.org/blog/2011/04/27/remote-development-from-mac-to-linux.html

https://github.com/dsherret/dax The big idea of rsyscall is that you can program distributed system in direct style. When programming locally, you do things by issuing syscalls: const fd = open( "/etc/passwd" ); This API works for doing things on remote machines, if you specify which machine you want to run the syscall on: const fd_local = open(.host, "/etc/passwd" ); const fd_cloud = open(.{.addr = "1.2.3.4" }, "/etc/passwd" ); Direct manipulation is the most natural API, and it pays to extend it over the network boundary. Peter’s post is an application of a similar idea to a narrow, mundane task of developing on Mac and testing on Linux. Peter suggests two scripts: remote-sync synchronizes a local and remote projects. If you run remote-sync inside ~/p/tb folder, then ~/p/tb materializes on the remote machine. rsync does the heavy lifting, and the wrapper script implements DWIM behaviors. It is typically followed by remote-run some --command , which runs command on the remote machine in the matching directory, forwarding output back to you. So, when I want to test local changes to tigerbeetle on my Linux box, I have roughly the following shell session: $ cd ~/p/tb/work $ code . # hack here $ remote-sync $ remote-run ./zig/zig build test The killer feature is that shell-completion works. I first type the command I want to run, taking advantage of the fact that local and remote commands are the same, paths and all, then hit ^A and prepend remote-run (in reality, I have rr alias that combines sync&run). The big thing here is not the commands per se, but the shift in the mental model. In a traditional ssh & vim setup, you have to juggle two machines with a separate state, the local one and the remote one. With remote-sync , the state is the same across the machines, you only choose whether you want to run commands here or there. With just two machines, the difference feels academic. But if you want to run your tests across six machines, the ssh approach fails — you don’t want to re-vim your changes to source files six times, you really do want to separate the place where the code is edited from the place(s) where the code is run. This is a general pattern — if you are not sure about a particular aspect of your design, try increasing the cardinality of the core abstraction from 1 to 2. The third component, dax library, is pretty mundane — just a JavaScript library for shell scripting. The notable aspects there are: JavaScript’s template literals, which allow implementing command interpolation in a safe by construction way. When processing $`ls ${paths}` , a string is never materialized, it’s arrays all the way to the exec syscall ( more on the topic).

JavaScript’s async/await, which makes managing concurrent processes (local or remote) natural: await Promise . all ([ $ `sleep 5` , $ `remote-run sleep 5` , ]);

Additionally, deno specifically valiantly strives to impose process-level structured concurrency, ensuring that no processes spawned by the script outlive the script itself, unless explicitly marked detached — a sour spot of UNIX. Combining the three ideas, I now have a deno script, called box , that provides a multiplexed interface for running ad-hoc code on ad-hoc clusters. A session looks like this: $ cd ~/p/tb/work $ git status --short M src/lsm/forest.zig $ box create 3 108.129.172.206,52.214.229.222,3.251.67.25 $ box list 0 108.129.172.206 1 52.214.229.222 2 3.251.67.25 $ box sync 0,1,2 $ box run 0 pwd /home/alpine/p/tb/work $ box run 0 ls CHANGELOG.md LICENSE README.md build.zig docs/ src/ zig/ $ box run 0,1,2 ./zig/download.sh Downloading Zig 0.14.1 release build... Extracting zig-x86_64-linux-0.14.1.tar.xz... Downloading completed (/home/alpine/p/tb/work/zig/zig)! Enjoy! $ box run 0,1,2 \ ./zig/zig build -Drelease -Dgit-commit=$(git rev-parse HEAD) $ box run 0,1,2 \ ./zig-out/bin/tigerbeetle format \ --cluster=0 --replica=?? --replica-count=3 \ 0_??.tigerbeetle 2026-01-20 19:30:15.947Z info(io): opening "0_0.tigerbeetle"... $ box destroy 0,1,2 I like this! Haven’t used in anger yet, but this is something I wanted for a long time, and now I have it

Structure The problem with implementing above is that I have zero practical experience with modern cloud. I only created my AWS account today, and just looking at the console interface ignited the urge to re-read The Castle. Not my cup of pu-erh. But I had a hypothesis that AI should be good at wrangling baroque cloud API, and it mostly held. I started with a couple of paragraphs of rough, super high-level description of what I want to get. Not a specification at all, just a general gesture towards unknown unknowns. Then I asked ChatGPT to expand those two paragraphs into a more or less complete spec to hand down to an agent for implementation. This phase surfaced a bunch of unknowns for me. For example, I wasn’t thinking at all that I somehow need to identify machines, ChatGPT suggested using random hex numbers, and I realized that I do need 0,1,2 naming scheme to concisely specify batches of machines. While thinking about this, I realized that sequential numbering scheme also has an advantage that I can’t have two concurrent clusters running, which is a desirable property for my use-case. If I forgot to shutdown a machine, I’d rather get an error on trying to re-create a machine with the same name, then to silently avoid the clash. Similarly, turns out the questions of permissions and network access rules are something to think about, as well as what region and what image I need. With the spec document in hand, I turned over to Claude code for actual implementation work. The first step was to further refine the spec, asking Claude if anything is unclear. There were couple of interesting clarifications there. First, the original ChatGPT spec didn’t get what I meant with my “current directory mapping” idea, that I want to materialize a local ~/p/tb/work as remote ~/p/tb/work , even if ~ are different. ChatGPT generated an incorrect description and an incorrect example. I manually corrected example, but wasn’t able to write a concise and correct description. Claude fixed that working from the example. I feel like I need to internalize this more — for current crop of AI, examples seem to be far more valuable than rules. Second, the spec included my desire to auto-shutdown machines once I no longer use them, just to make sure I don’t forget to turn the lights off when leaving the room. Claude grilled me on what precisely I want there, and I asked it to DWIM the thing. The spec ended up being 6KiB of English prose. The final implementation was 14KiB of TypeScript. I wasn’t keeping the spec and the implementation perfectly in sync, but I think they ended up pretty close in the end. Which means that prose specifications are somewhat more compact than code, but not much more compact. My next step was to try to just one-shot this. Ok, this is embarrassing, and I usually avoid swearing in this blog, but I just typoed that as “one-shit”, and, well, that is one flavorful description I won’t be able to improve upon. The result was just not good (more on why later), so I almost immediately decided to throw it away and start a more incremental approach. In my previous vibe-post, I noticed that LLM are good at closing the loop. A variation here is that LLMs are good at producing results, and not necessarily good code. I am pretty sure that, if I had let the agent to iterate on the initial script and actually run it against AWS, I would have gotten something working. I didn’t want to go that way for three reasons: Spawning VMs takes time, and that significantly reduces the throughput of agentic iteration.

No way I let the agent run with a real AWS account, given that AWS doesn’t have a fool-proof way to cap costs.

... continue reading