I’ve done a few different reverse-engineering projects with LLMs, and figured it’s time to push the clankers to their limits.
A RAR compressor for every version of RAR ought to have taken about 5 years, which is why nobody has ever bothered. Today, it takes 5 weeks of evenings and weekends, clanking OpenAI Codex 5.5 and Claude Opus 4.7, and cost roughly £40 in (heavily subsidised) tokens.
Yes it’s 55k lines of slop, no it’s not that fast, and it almost earned me an OpenAI ban. But it works.
RAR was originally an LZSS compressor for DOS, which peaked in popularity as the warez scene’s format of choice. Fighting with WinZip for feature parity and supremacy, WinRAR boasted multi-volume support, recovery records and even an internal VM, but its USP was always superior compression. It’s a middle-aged format that never stopped growing up, it’s as big as a house.
unrar comes with source code but that code is not actually free, and somewhat ironically RAR’s author Eugene Roshal isn’t a big fan of piracy. So ideally I’d need to implement my version from spec, which doesn’t really exist.
The monstrous task of creating one involved pulling code from free decompressor sources in the wild - unar, libarchive, UNRARLIB, plus random web pages and folk lore.
I then set Claude to work documenting as much as it could. After each pass, I quizzed it on missing features and maintained an ongoing gaps doc containing the hard-to-know stuff. This persisted between context resets, which were needed to flow the tokens into the gaps. It took 2 weeks of cooking, going back and forth until we had most of the reader side documented. The writer side, however, remained a mix of confabulation and conjecture.
So next I grabbed the RAR binaries for DOS and Windows, and set to work making test fixtures, hex-dumping and doing passes in Ghidra and DOSBox-x to get some idea of how they were packed. Another week or two of work and the gaps started to close up.
Now I had something that might be useful; spec docs for every version of the RAR file format:
Being confidently wrong enough to start, Codex, Claude and I set off building a (precariously) compatible Rust CLI. The workflow was shaped something like this:
... continue reading