At Google Cloud, we’re constantly pushing the scalability of Google Kubernetes Engine (GKE) so that it can keep up with increasingly demanding workloads — especially AI. GKE already supports massive 65,000-node clusters, and at KubeCon, we shared that we successfully ran a 130,000-node cluster in experimental mode — twice the number of nodes compared to the officially supported and tested limit.
This kind of scaling isn't just about increasing the sheer number of nodes; it also requires scaling other critical dimensions, such as Pod creation and scheduling throughput. For instance, during this test, we sustained Pod throughput of 1,000 Pods per second, as well as storing over 1 million objects in our optimized distributed storage. In this blog, we take a look at the trends driving demand for these kinds of mega-clusters, and do a deep dive on the architectural innovations we implemented to make this extreme scalability a reality.
The rise of the mega cluster
Our largest customers are actively pushing the boundaries of GKE’s scalability and performance with their AI workloads. In fact, we already have numerous customers operating clusters in the 20-65K node range, and we anticipate the demand for large clusters to stabilize around the 100K node mark.
This sets up an interesting dynamic. In short, we are transitioning from a world constrained by chip supply to a world constrained by electrical power. Consider the fact that a single NVIDIA GB200 GPU needs 2700W of power. With tens of thousands, or even more, of these chips, a single cluster's power footprint could easily scale to hundreds of megawatts — ideally distributed across multiple data centers. Thus, for AI platforms exceeding 100K nodes, we’ll need robust multi-cluster solutions that can orchestrate distributed training or reinforcement learning across clusters and data centers. This is a significant challenge, and we’re actively investing in tools like MultiKueue to address it, with further innovations on the horizon. We are also advancing high-performance RDMA networking with the recently announced managed DRANET, improving topology awareness to maximize performance for massive AI workloads. Stay tuned.
At the same time, these investments also benefit users who operate at more modest scales — the vast majority of GKE customers. By hardening GKE's core systems for extreme usage, we create substantial headroom for average clusters, making them more resilient to errors, increasing tolerance for user misuse of the Kubernetes API, and generally optimizing all controllers for faster performance. And of course, all GKE customers, large and small, benefit from investments in an intuitive, self-service experience.
Key architectural innovations
With that said, achieving this level of scale requires significant innovations throughout the Kubernetes ecosystem, including control plane, custom scheduling and storage. Let’s take a look at a few key areas that were critical to this project.
Optimized read scalability
When operating at scale, there’s a need for a strongly consistent and snapshottable API server watch cache. At 130,000 nodes, the sheer volume of read requests to the API server can overwhelm the central object datastore. To solve this, Kubernetes includes several complementary features to offload these read requests from the central object datastore.
... continue reading