Skip to content
Tech News
← Back to articles

Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB

read original more articles
Why This Matters

This innovative approach demonstrates that overfitting a small transformer model to memorize a specific file, combined with arithmetic coding, can achieve highly effective compression ratios. While currently slow, this method offers a novel perspective on data compression, potentially influencing future research and applications in the tech industry. It highlights the potential of AI models to challenge traditional compression techniques by leveraging memorization capabilities.

Key Takeaways

I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files.

Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output.

On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte).

It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT).

Checkout the repo - https://github.com/samyak112/pym-particles