The internet kicked off the week the way that many of us want to: by refusing to go to work. An outage at Amazon Web Services rendered huge portions of the internet unavailable on Monday. Sites and services including Snapchat, Fortnite, Venmo, the PlayStation Network and, predictably, Amazon, were unavailable off and on through the start of the day.
The outage began shortly after midnight PT, and took Amazon around three and a half hours to fully resolve. Social networks and streaming services were among the 2,000-plus companies affected, and critical services such as online banking were also taken down.
As of 12:15 p.m. PT, Amazon said it continued to see recovery across all AWS services. The company said customers who use AWS Lambda, a compute service that that runs code without the need to manage servers, "may face intermittent function errors for functions making network requests to other services or systems as we work to address residual network connectivity issues."
The company said it would issue another update at 1 p.m. PT.
Timetable of outage
The issues seemed to have been largely resolved as the US East Coast was coming online, but spiked again dramatically after 8 a.m. PT as work began on the West Coast. It's possible this happened because West Coasters simply were adding to the reports, or that as more people tried to access the systems, they degraded further.
AWS, a cloud services provider owned by Amazon, props up huge portions of the internet. So when it went down, it took many of the services we know and love with it. As with the Fastly and Crowdstrike outages over the past few years, the AWS outage shows just how much of the internet relies on the same infrastructure -- and how quickly our access to the sites and services we rely on can be revoked when something goes wrong.
The reliance on a small number of big companies to underpin the web is akin to putting all of our eggs in a tiny handful of baskets. When it works, it's great, but only one small thing needs to go wrong for the internet to fall to its knees in a matter of minutes.
How widespread was the AWS outage?
Just after midnight PT on Oct. 20, AWS first registered an issue on its service status page, saying it was "investigating increased error rates and latencies for multiple AWS services in the US-East-1 Region." Around 2 a.m. PT, it said it had identified a potential root cause of the issue. Within half an hour, it had started applying mitigations that were resulting in significant signs of recovery.
... continue reading