Tech News
← Back to articles

Load Test GlassFlow for ClickHouse: Real-Time Dedup at Scale

read original related products more articles

Load Test GlassFlow for ClickHouse: Real-Time Deduplication at Scale

By Ashish Bagri, Co-founder & CTO of GlassFlow

TL;DR

We tested GlassFlow on a real-world deduplication pipeline with Kafka and ClickHouse.

It handled 55,00 records/sec published by Kafka and processed 9,000+ records/sec on a MacBook Pro, with sub-0.12ms latency.

No crashes, no message loss, no disordering. Even with 20M records and 12 concurrent publishers, it remained robust.

Want to try it yourself? The full test setup is open source: https://github.com/glassflow/clickhouse-etl-loadtest and the docs https://docs.glassflow.dev/load-test/setup

Why this test?

ClickHouse is incredible at fast analytics. But when building real-time pipelines from Kafka to ClickHouse, many teams run into the same issues: analytics results are incorrect or too delayed to support real-time use cases.

The root cause? Data duplications and slow joins. They are often introduced by retries, offset reprocessing, or downstream enrichment. These problems can affect both correctness and performance.

... continue reading