Tech News
← Back to articles

The Surprising gRPC Client Bottleneck in Low-Latency Networks

read original related products more articles

The Surprising gRPC Client Bottleneck in Low-Latency Networks — and How to Get Around It Evgeniy Ivanov 9 min read · Just now Just now -- Listen Share

Zoom image will be displayed

“Improving anything but the bottleneck is an illusion.” — Eliyahu M. Goldratt

At YDB, we use gRPC to expose our database API to clients. Therefore, all our load generators and benchmarks are gRPC clients. Recently, we discovered that the fewer cluster nodes we have, the harder it is for the benchmark to load the cluster. Moreover, when we shrink the cluster size, it results in more and more idle resources, while we observe steadily increasing client-side latency. Fortunately, we identified the root cause as a bottleneck on the client side of gRPC.

In this post, we describe the issue and the steps to reproduce it using a provided gRPC server/client microbenchmark. Then, we show a recipe to avoid the discovered bottleneck and achieve high throughput and low latency simultaneously. We present a comparative performance evaluation that illustrates the relationship between latency and throughput, as well as the number of concurrent in-flight requests.

A Very Short gRPC Introduction

gRPC is usually considered “a performant, robust platform for inter-service communication”. Within a gRPC client, there are multiple gRPC channels, each supporting many RPCs (streams). gRPC is implemented over the HTTP/2 protocol, and each gRPC stream corresponds to an HTTP/2 stream.

gRPC channels to different gRPC servers have their own TCP connections. Also, when you create a channel, you might specify channel arguments (channel configuration), and channels created with different arguments will have their own TCP connections. Otherwise, as we discovered, all channels share the same TCP connection regardless of traffic (which is quite unexpected), and gRPC uses HTTP/2 to multiplex RPCs.

In gRPC’s Performance Best Practices, it is stated that:

(Special topic) Each gRPC channel uses 0 or more HTTP/2 connections and each connection usually has a limit on the number of concurrent streams. When the number of active RPCs on the connection reaches this limit, additional RPCs are queued in the client and must wait for active RPCs to finish before they are sent. Applications with high load or long-lived streaming RPCs might see performance issues because of this queueing. There are two possible solutions: 1. Create a separate channel for each area of high load in the application. 2. Use a pool of gRPC channels to distribute RPCs over multiple connections (channels must have different channel args to prevent re-use so define a use-specific channel arg such as channel number).

... continue reading