Tech News
← Back to articles

The web's infrastructure has a concentration problem, exposing us all to crushing outages — from AWS and Azure to Cloudflare, the perils of having a centralized internet are being felt by all

read original related products more articles

Internet outages happen all the time. Just this week, the recent Cloudflare outage disrupted millions of users. The infrastructure on which our digital lives are built is precarious and often prone to errors. When those happened, they used to have a small impact. A website’s servers crashing would bring down only that website and anything that relied on it. But as the web has become more centralised in its infrastructure, with a handful of companies dominating the market, any individual issue has the potential to domino into a much more significant one.

We’ve seen two recent alarming examples of that happening in real life. On October 20, thousands of services around the world fell offline and ground to a halt after processes that are meant to keep the Domain Name System (DNS) routing and records that AWS controls went out of sync, triggering a “latent race condition”, a harmful bug that cascaded through almost all of AWS’s systems, including other routing services. That meant what was initially a single error in the U.S.-EAST-1 cluster of data centres in Northern Virginia became a problem affecting everyone, as far away as Australia and the UK.

Then, less than 10 days later, a similar issue struck Microsoft’s Azure cloud system. Xbox gamers, the Scottish parliament, and many other key bits of infrastructure fell offline, thanks to another DNS issue.

Both were quickly resolved, but the speed at which they caused chaos in the online-offline world in which we now live highlighted just how precarious our digital lives can be. And it began to get people thinking: does the web’s key infrastructure have a problem of over-concentration when it comes to power?

A concentration of power

(Image credit: Getty Images / NurPhoto)

The big three cloud infrastructure providers – AWS, Azure and Google Cloud – together hold more than two-thirds of the market. They’ve attained that level of power because of their remarkable uptime. The fact that things go wrong so rarely is a vindication of their ability and reliability. Yet it also means that more and more services are hosted on fewer and fewer servers controlled by fewer companies – so on the rare occasion that something does go wrong, it goes really wrong.

“When one of the major cloud providers experiences an issue, it doesn't just affect one company; it ripples across sectors, services, and even countries,” said Graeme Stewart, head of public sector at Check Point Software. Stewart pointed out that both the AWS and Azure outages looked more like cock-up than conspiracy – certainly the AWS issue was, while the Azure one is still being investigated. Yet “incidents like this highlight how fragile our online infrastructure really is,” he said. “We have become so dependent on a handful of global platforms that one glitch can disrupt everything from banking to travel.”

And those glitches can have meaningful impacts on us all, given that the infrastructure providers are used by companies that collectively have hundreds of millions of users, and span industries, meaning that banks are as likely to go down as video games or government voting systems.

Experts weigh in

... continue reading