Lock Contention
Published on: 2025-05-31 01:47:08
Overview
Recently, I revisited Resolving a year-long ClickHouse lock contention post and spoke about it at C++ Russia 2025 conference.
I wanted to provide more information about the development process and some technical details that were not covered in the original post.
Motivation
In 2022 in Tinybird, there was a huge CPU underutilization in one of our clusters during the high load period.
It was unclear what was the issue. There were no IO/Network/Memory bottlenecks. In ClickHouse all async metrics and query profile events were normal. The only unusual thing was that with increased queries throughput, ClickHouse could not handle the load, and CPU usage was very low.
The problem continued for a year and during similar incidents, we could not find any clues.
One year later during a similar incident, we spotted that ContextLockWait async metric periodically increased. Async metrics are calculated periodically with some interval and include for example memory usage, and some glob
... Read full article.