Why Your Load Balancer Still Sends Traffic to Dead Backends

Why zombie instances survive health checks, and what the choice between server-side and client-side load balancing means for how fast your system detects and reacts to failure.

A service reports healthy. The load balancer believes it. A request lands on it and times out. Another follows. Then ten more. By the time the system reacts, hundreds of requests have drained into a broken instance while users stared at a spinner.

Health checking sounds simple: ask if something is alive, stop sending traffic if it isn’t. In practice, the mechanism behind that check, and who performs it, determines how fast your system detects failure, how accurately it responds, and how much of that complexity leaks into your application code.

The answer is fundamentally different depending on where load balancing lives: in a central proxy, or in the client itself.

Two Models for Distributing Traffic

Before getting into health checks, it helps to be precise about what each model looks like.

Server-Side Load Balancing

A dedicated proxy sits between clients and the backend fleet. Clients know one address: the load balancer. The load balancer knows the backend pool and decides where each request goes.

The load balancer is the single point of intelligence. It tracks backend health, maintains connection pools, and routes traffic. Clients are completely unaware of the backend topology; they see one stable address regardless of how many instances are behind it, or how many fail.

HAProxy, NGINX, AWS ALB, and most hardware appliances follow this model.

... continue reading