The Unreasonable Effectiveness of Fuzzing for Porting Programs

A simple strategy of having LLMs write fuzz tests and build up a port in topological order seems effective at automating porting from C to Rust.

Agents are starting to produce more and more code

A week or 2 back, I was reflecting on some code Claude had generated for me and I had a sort of moment of clarity. "Clarity" might be overstating it; more like the type of thought you have in the shower or after a few beers. Anyway.

The thought was: LLMs produce more and more code, and they'll eventually be producing more code than people. We can imagine at some point we're going to fork from mostly people writing code to mostly computers. What does this mean for how we're going to treat our libraries and code maintenance in the future?

LLMs make it easier to deal with API issues or inconsistencies, they're dealing with it, not us. Will this continue, leaving us with a The Princess and the Pea situation where we deal with piles of leaking abstractions? Or will we use LLMs to radically change our APIs when needed - will things get cleaner and better and faster as we go?

LLMs open up the door to performing radical updates that we'd never really consider in the past. We can port our libraries from one language to another. We can change our APIs to fix issues, and give downstream users an LLM prompt to migrate over to the new version automatically, instead of rewriting their code themselves. We can make massive internal refactorings. These are types of tasks that in the past, rightly, a senior engineer would reject in a project until its the last possibly option. Breaking customers almost never pays off, and its hard to justify refactoring on a "maintenance mode" project.

But if its more about finding the right prompt and letting an LLM do the work, maybe that changes our decision process.

Maintaining big important libraries is no fun

I used to work on TensorFlow. (TensorFlow is a system to develop gradient based models, like PyTorch. It's not used so much anymore outside of Google.) Now TensorFlow had some design flaws in the core language, and a botched version 1 -> version 2 migration didn't help matters too much. But as a maintainer, the biggest issue was the enormous technical debt. It suffered from a sort of "career advancement syndrome" I sadly missed this period and only was around for the aftermath where we had to deal with the cruft. : TensorFlow was popular and you got credit for contributing to it, so there was a huge incentive to add some feature as quickly as you could and then get away.

As a result of a few years of this style of development, a huge surface of Python code had been cobbled together on top of the C++ core. The complexity only spiraled over time: engineers came into the project and needed to get something done. Increasingly the easiest thing to do was to add some Python global or context manager, shove something in it, then grab it later. Do this over and over and eventually it becomes impossibly to figure out what's happening. It also ended up being incredibly slow. It would take 10s of minutes to build some graphs (TensorFlow's equivalent of a program).

... continue reading