Load Test GlassFlow for ClickHouse: Real-Time Deduplication at Scale
By Ashish Bagri, Co-founder & CTO of GlassFlow
TL;DR
We tested GlassFlow on a real-world deduplication pipeline with Kafka and ClickHouse.
It handled 55,00 records/sec published by Kafka and processed 9,000+ records/sec on a MacBook Pro, with sub-0.12ms latency.
No crashes, no message loss, no disordering. Even with 20M records and 12 concurrent publishers, it remained robust.
Want to try it yourself? The full test setup is open source: https://github.com/glassflow/clickhouse-etl-loadtest and the docs https://docs.glassflow.dev/load-test/setup
Why this test?
ClickHouse is incredible at fast analytics. But when building real-time pipelines from Kafka to ClickHouse, many teams run into the same issues: analytics results are incorrect or too delayed to support real-time use cases.
The root cause? Data duplications and slow joins. They are often introduced by retries, offset reprocessing, or downstream enrichment. These problems can affect both correctness and performance.
... continue reading