Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB

2026-06-23 | original

read original more articles

Why This Matters

This innovative approach demonstrates that overfitting a small transformer model to memorize a specific file, combined with arithmetic coding, can achieve highly effective compression ratios. While currently slow, this method offers a novel perspective on data compression, potentially influencing future research and applications in the tech industry. It highlights the potential of AI models to challenge traditional compression techniques by leveraging memorization capabilities.

Key Takeaways

Overfitting a small transformer can effectively compress individual files.
The method achieves significant reduction in file size, e.g., 100MB to 7MB.
Current implementation is slow, but offers a new avenue for AI-driven data compression.

I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files.

Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output.

On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte).

It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT).

Checkout the repo - https://github.com/samyak112/pym-particles

Explore topics: transformer arithmetic coding enwik9 nyc taxi csv amd 7800xt