Amazon says it recently achieved a major breakthrough in networking design—and has been quietly deploying the new technology in its data centers since late last year. The company claims it has significantly increased data speeds while reducing energy use, potentially giving the tech giant an edge as companies race to build ever-faster systems in the cloud.
The new technology hinges on a “quasi-random” design that combines elements of traditional, structured data networks with the performance advantages of more random architectures. Researchers have explored random networks for decades, but the technology has never been successfully scaled. Now, Amazon thinks it has cracked the code.
The fact that Amazon is using this in the real world is “remarkable,” says Brighten Godfrey, a computer science professor at the University of Illinois at Urbana-Champaign and an expert in networking, who was not involved in Amazon’s research. Godfrey co-authored a seminal 2012 paper on random network graphs, which he says are a “mind-bending problem to solve, in general.”
A team of engineers and researchers at Amazon Web Services, including several recruited from academia, has been working on the random networking problem since 2023. Amazon also designed a new piece of data center equipment it dubbed the ShuffleBox, which automatically shuffles the cables required for this kind of networking.
“By essentially flattening the network, we eliminated the bottlenecks that come with traditional networking designs,” Matt Rehder, vice president of AWS Network Engineering, said in an exclusive interview with WIRED. “We think we’re the only ones who have done this at scale.”
Courtesy of Amazon
Network Effects
Amazon detailed its new networking design in a paper published last month titled “RNG: Flat Datacenter Networks at Scale.” RNG stands for “resilient network graphs,” which are neither entirely structured nor entirely random.
Interestingly, the Amazon team behind RNG isn’t making this networking pitch around generative AI. This is about making the company’s everyday data center architecture more efficient. “RNG is a great fit for our core demands, but AI training data patterns are far more coordinated and centrally orchestrated, so they don’t approximate a random graph,” Rehder says.
Since the mid-1980's, communications networks—from telecom to to data centers—have been predominantly designed with a “fat-tree” topology, which includes two or three vertical layers of switches and routers. These are connected by “fat” nodes at the top of the structure, where there are multiple routers of the same type, and thinner branches towards the bottom. Put very simply, in a fat-tree network, data moves up and down the stack. The increased bandwidth near the top of the structure, where the data bisects, helps eliminate bottlenecks.