The coming industrialisation of exploit generation with LLMs

Recently I ran an experiment where I built agents on top of Opus 4.5 and GPT-5.2 and then challenged them to write exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. I added a variety of modern exploit mitigations, various constraints (like assuming an unknown heap starting state, or forbidding hardcoded offsets in the exploits) and different objectives (spawn a shell, write a file, connect back to a command and control server). The agents succeeded in building over 40 distinct exploits across 6 different scenarios, and GPT-5.2 solved every scenario. Opus 4.5 solved all but two. I’ve put a technical write-up of the experiments and the results on Github, as well as the code to reproduce the experiments.

In this post I’m going to focus on the main conclusion I’ve drawn from this work, which is that we should prepare for the industrialisation of many of the constituent parts of offensive cyber security. We should start assuming that in the near future the limiting factor on a state or group’s ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ. Nothing is certain, but we would be better off having wasted effort thinking through this scenario and have it not happen, than be unprepared if it does.

A Brief Overview of the Experiment

All of the code to re-run the experiments, a detailed write-up of them, and the raw data the agents produced are on Github, but just to give a flavour of what the agents accomplished:

Both agents turned the QuickJS vulnerability into an ‘API’ to allow them to read and arbitrarily modify the address space of the target process. As the vulnerability is a zeroday with no public exploits for it, this capability had to be developed by the agents through reading source code, debugging and trial and error. A sample of the notable exploits is here and I have written up one of them in detail here. They solved most challenges in less than an hour and relatively cheaply. I set a token limit of 30M per agent run and ran ten runs per agent. This was more than enough to solve all but the hardest task. With Opus 4.5 30M total tokens (input and output) ends up costing about $30 USD. In the hardest task I challenged GPT-5.2 it to figure out how to write a specified string to a specified path on disk, while the following protections were enabled: address space layout randomisation, non-executable memory, full RELRO, fine-grained CFI on the QuickJS binary, hardware-enforced shadow-stack, a seccomp sandbox to prevent shell execution, and a build of QuickJS where I had stripped all functionality in it for accessing the operating system and file system. To write a file you need to chain multiple function calls, but the shadow-stack prevents ROP and the sandbox prevents simply spawning a shell process to solve the problem. GPT-5.2 came up with a clever solution involving chaining 7 function calls through glibc’s exit handler mechanism. The full exploit is here and an explanation of the solution is here. It took the agent 50M tokens and just over 3 hours to solve this, for a cost of about $50 for that agent run. (As I was running four agents in parallel the true cost was closer to $150).

Before going on there are two important caveats that need to be kept in mind with these experiments:

While QuickJS is a real Javascript interpreter, it is an order of magnitude less code, and at least an order of magnitude less complex, than the Javascript interpreters in Chrome and Firefox. We can observe the exploits produced for QuickJS and the manner in which they were produced and conclude, as I have, that it appears that LLMs are likely to solve these problems either now or in the near future, but we can’t say definitively that they can without spending the tokens and seeing it happen. The exploits generated do not demonstrate novel, generic breaks in any of the protection mechanisms. They take advantage of known flaws in those protection mechanisms and gaps that exist in real deployments of them. These are the same gaps that human exploit developers take advantage of, as they also typically do not come up with novel breaks of exploit mitigations for each exploit. I’ve explained those gaps in detail here. What is novel are the overall exploit chains. This is true by definition as the QuickJS vulnerability was previously unknown until I found it (or, more correctly: my Opus 4.5 vulnerability discovery agent found it). The approach GPT-5.2 took to solving the hardest challenge mentioned above was also novel to me at least, and I haven’t been able to find any example of it written down online. However, I wouldn’t be surprised if it’s known by CTF players and professional exploit developers, and just not written down anywhere.

The Industrialisation of Intrusion

By ‘industrialisation’ I mean that the ability of an organisation to complete a task will be limited by the number of tokens they can throw at that task. In order for a task to be ‘industrialised’ in this way it needs two things:

An LLM-based agent must be able to search the solution space. It must have an environment in which to operate, appropriate tools, and not require human assistance. The ability to do true ‘search’, and cover more of the solution space as more tokens are spent also requires some baseline capability from the model to process information, react to it, and make sensible decisions that move the search forward. It looks like Opus 4.5 and GPT-5.2 possess this in my experiments. It will be interesting to see how they do against a much larger space, like v8 or Firefox. The agent must have some way to verify its solution. The verifier needs to be accurate, fast and again not involve a human.

... continue reading