Skip to content
Tech News
← Back to articles

Running Out of Disk Space in Production

read original get SSD External Hard Drive → more articles
Why This Matters

This article highlights the critical importance of managing disk space effectively in production environments, especially as unexpected traffic surges can quickly exhaust storage resources. For the tech industry and consumers, it underscores the need for robust monitoring and scalable infrastructure to prevent downtime and data loss during peak usage. Proper planning and proactive maintenance are essential to ensure reliable service delivery and avoid costly disruptions.

Key Takeaways

Last night I put up a simple server which allowed customers to download the digital Kanjideck files. This server is hosted on a small Hetzner machine running NixOS, at 4GB of RAM and 40GB of disk space. One of these downloadable files weights 2.2GB.

The matter at hand boils down to a simple Haskell program which serves static files (with some extra steps regarding authorization) plus an nginx reverse proxy which proxies requests to a certain “virtual host” to the Haskell program.

Fig 1. Simplified server architecture

1 First, Panic

Not even minutes after I announced that the files were finally available, hundreds of customers visited my server all at once. As the logs started flying off of my screen with all the accesses, I started noticing a particularly interesting message, repeated over and over again:

Mar 31 20:43:03 mogbit kanjideck-fulfillment[2528300]: user error (Unexpected reply to: MAIL "<...> at kanjideck.com", Expected reply code: 250, Got this instead: 452 "4.3.1 Insufficient system storage\r

")

Oh no. No one’s able to access their files and I’m already receiving emails about it. I messaged the users explaining the server was having some issues that I was resolving.

Grafana shows 40GB/40GB disk space used up, so does df -h have 100% usage of /dev/sda . I have to clear up space fast. I’m afraid at this point that I’m not even receiving the user complaints anymore since my mail could be getting dropped by lack of space.

I rushed to run du -sh on everything I could, as that’s as good as I could manage. The two larger culprits were /var/lib ’s Plausible Analytics, with a 8.5GB (clickhouse) database, and the /nix/store with the full server configuration, installation, and executables, at 15GB.

... continue reading