Tech News
← Back to articles

The equality delete problem in Apache Iceberg

read original related products more articles

The Equality Delete Problem in Apache Iceberg Yingjun Wu 9 min read · 14 hours ago 14 hours ago -- Listen Share

Press enter or click to view image in full size

Since last year, Apache Iceberg has been one of the hottest topics in the data infrastructure world.

Databricks recently spent $1 billion to acquire Neon, a startup building a serverless Postgres. Snowflake also spent about $250 million to acquire Crunchy Data, a veteran enterprise-grade Postgres provider.

These are not random acquisitions. They represent a bet by two major database vendors on the same story — Postgres + Apache Iceberg: Postgres for transactional workloads and smaller queries, Iceberg for large-scale analytics, both tied together within the same vendor’s ecosystem.

Postgres and Apache Iceberg are both mature systems, but here’s a question many people haven’t thought through: How do you stream data from Postgres into Apache Iceberg in real time?

It sounds straightforward: just use an existing CDC (Change Data Capture) system like Debezium to write change events directly into Iceberg. But reality is far from that simple.

Here’s the surprising truth: Mainstream systems like Snowflake, Databricks, and Redshift do not natively support “plain CDC writes” into Iceberg.

In this article, I’ll expose the part few people talks about, and explain how RisingWave makes true streaming CDC ingestion into Iceberg possible through a series of engineering techniques.

CDC and the Two Types of Deletes in Iceberg

... continue reading