It wasn't funny for anyone who couldn't access regular online destinations, or for the engineers trying to fix the problems, but Monday's massive Amazon Web Services outage was something of a comedy of errors.
In a dense, detailed note posted after all the issues had been settled, AWS explained how the series of events unfolded and what it's planning to do to prevent similar collapses in the future.
The outage rendered huge portions of the internet unavailable for much of the workday for many people. As Monday rolled along, it affected more than 2,000 companies and services, including Reddit, Ring, Snapchat, Fortnite, Roblox, the PlayStation Network, Venmo, Amazon itself, critical services such as online banking and household amenities such as luxury smart beds.
In its explainer post, AWS apologized for the breakdown's impact, saying, "We know how critical our services are to our customers, their applications and end users, and their businesses."
Don't miss any of our unbiased tech content and lab-based reviews. Add CNET as a preferred Google source.
Why were so many sites affected?
AWS, a cloud services provider owned by Amazon, props up huge portions of the internet. When it went down, it took many of the services we know and rely on. As with the Fastly and CrowdStrike outages over the past few years, the AWS outage shows just how much of the internet relies on the same infrastructure -- and how quickly our access to everyday sites and services can be revoked when something goes wrong.
Our reliance on a small number of big companies to underpin the web is akin to putting all our eggs in a handful of baskets. When it works, it's great, but only one tiny thing needs to go wrong for the internet to fall to its knees in minutes.
In total, outage reporting site Downdetector saw over 9.8 million reports, with 2.7 million coming from the US, over 1.1 million from the UK, and the rest largely spread across Australia, Japan, the Netherlands, Germany and France. Over 2,000 companies in total were affected, with around 280 still experiencing issues at 10 a.m. PT. (Downdetector is owned by the same parent company as CNET, Ziff Davis.)
"This kind of outage, where a foundational internet service brings down a large swath of online services, only happens a handful of times in a year," Daniel Ramirez, Downdetector by Ookla's director of product, told CNET. "They probably are becoming slightly more frequent as companies are encouraged to completely rely on cloud services and their data architectures are designed to make the most out of a particular cloud platform."
... continue reading