Federating Clusters for Zero-Downtime Kubernetes

Every multi-region setup eventually meets the same awkward moment: a whole cluster goes away, and the identical copy of your service running two regions over might as well not exist, because nothing is wired to treat them as one thing. Failover becomes a runbook: restore, repoint DNS, and wait for an outage that, on paper, you’d already paid to survive.

Linkerd’s multicluster extension closes that gap by letting several clusters present a service as a single, load-balanced endpoint. The part that the official tasks gloss over is that a real platform almost never picks one multicluster mode. Some services want federation (same service everywhere, one endpoint, automatic failover). While others want mirroring (reach a specific remote service by name). And you frequently want both patterns living on the same set of links. The docs walk through each mode on its own. This post wires all three together across three GKE clusters, with a full-mesh link topology, a chaos test that takes out an entire cluster, and scripts you can clone and run on a fresh GCP project.

Companion repo: Every script referenced here lives in this repository. Feel free to clone it, set your project ID, and run it.

Linkerd multicluster modes: Gateway, flat, and federated

Linkerd’s multicluster extension supports three modes. The nice thing is they’re not mutually exclusive: on the same set of linked clusters, the mode is chosen per service via a label.

Mode Label What happens Network Requirement Hierarchical (gateway) mirror.linkerd.io/exported=true Service mirrored as <svc>-<cluster> , traffic routed through a gateway Gateway IP reachable Flat (pod-to-pod) mirror.linkerd.io/exported=remote-discovery Service mirrored as <svc>-<cluster> , traffic goes directly to remote pods Flat network (pod IPs routable) Federated mirror.linkerd.io/federated=member All same-name services unioned into <svc>-federated , load balanced across all clusters Flat network (pod IPs routable)

The distinction that matters operationally is that hierarchical mirroring works on any network. Only the gateway IP needs to be reachable, while flat and federated modes need real pod-to-pod connectivity. On GCP, VPC-native GKE clusters on peered VPCs give you that flat network for free. So, you can run federated services for your core workloads over a flat network and still mirror a specialized service through a gateway from a cluster that isn’t on that network. Most platform teams I’ve seen end up with exactly this kind of mix.

Multi-region architecture: GKE cluster setup

We have three GKE clusters across three regions, fully linked to each other (six directional links total). Three demo services, each using a different multicluster mode:

... continue reading