After seeing the engineers at Tesla talk about 1B row/s ClickHouse ingestion, I wanted to see if I could do it myself.
A few weeks ago, I saw a talk from Tesla claiming they were ingesting 1B rows per second using ClickHouse. I'm a petrolhead but I don't have any reason to think they are lying :). One (American) billion rows per second might feel like a lot, so let me try to explain how you can achieve that using ClickHouse. I'm not sure what ClickHouse flavor Tesla uses, but I don't think that's really important. I'm going to use the open source ClickHouse version for these tests.
Let me do first a super quick intro about ClickHouse architecture:
ClickHouse clusters are made up of nodes which can be replicas and shards.
Each shard stores a portion of the data. Sharding can be random or using any kind of rule (e.g., split by customer)
Each shard has N replicas, a "copy" of the data in each node
Coordination is done using Zookeeper (original Zookeeper or the ClickHouse keeper).
So data is sent to any of the replicas in each shard, ClickHouse replicates to all the replicas in that shard. When querying ClickHouse, the query is distributed using any replica in all the shards.
If you are familiar with ClickHouse you most likely already know. If not, you'll need to do some back and forth with ChatGPT, but the important part is that you have buckets and you put your data in any of them.
How do we ingest 1B rows per second?
... continue reading