The Failure Rate of EBS
Published on: 2025-06-12 05:24:04
The Real Failure Rate of EBS
By Nick Van Wiggeren | March 18, 2025
PlanetScale has deployed millions of Amazon Elastic Block Store (EBS) volumes across the world. We create and destroy tens of thousands of them every day as we stand up databases for customers, take backups, and test our systems end-to-end. Through this experience, we have an unique viewpoint into the failure rate and mechanisms of EBS, and have spent a lot of time working on how to mitigate them.
In complex systems, failure isn’t a binary outcome. Cloud native systems are built without single paths of failure, but partial failure can still result in degraded performance, loss of user-facing availability, and undefined behavior. Often, minor failure in one part of the stack appears as a full failure in others.
For example, if a single instance inside of a multi-node distributed caching system runs out of networking resources, the downstream application will interpret error cases as cache misses to avoid failing the
... Read full article.