Tech News
← Back to articles

Row-level transformations in Postgres CDC using Lua

read original related products more articles

Earlier this week, we launched PeerDB Streams, our latest product offering for real-time replication from Postgres to queues and message brokers such as Kafka, Redpanda, Google PubSub, Azure Event Hubs, and others.

Today, we are announcing one of the flagship features of this offering — support for row-level transformations as part of Postgres Change Data Capture (CDC). You can write simple Lua scripts to define a transformation and add it as part of the replication (MIRROR). With this feature, users will be able to seamlessly perform in-flight row-level transformations to Postgres data before it is streamed to the target.

In this blog, we will cover various use cases that require row-level transformations and how they can be accomplished using PeerDB. We will also walk through example use cases using sample Lua scripts. Toward the end, we will delve a bit deeper into why we chose Lua as the scripting language and how we implemented this feature.

Row-Level Transformation in Postgres CDC: Use Cases

There are multiple use cases that require row-level transformations during Postgres CDC. A few of the common scenarios include:

Masking PII Data: Replace sensitive PII with tokens or pseudonyms before data enters Kafka, obfuscating it from other micro-services in transactional outbox scenarios, thus enhancing privacy and compliance. Changing Data Format: Transform data into required formats like Protobuf, JSON, MsgPack, Avro and so on for seamless integration and optimized handling across systems. Generated Columns: Calculate new column values based on transformations of existing data, such as aggregations or derived metrics, and stream these new columns to enhance real-time data analysis and reporting. Unnesting JSONs: Extract elements from JSON objects and flatten them into separate fields within Kafka messages, improving the accessibility and usability of data across different consumer applications. Topic Routing: Distribute Change Data Capture (CDC) events to specific Kafka topics based on rules or conditions, facilitating targeted data streaming and processing. Data Encryption: Apply encryption to sensitive data before it is written to Kafka, enhancing security and preventing unauthorized access as data moves between systems.

Now, let's see how some of the above use cases can be accomplished using PeerDB, through examples and sample Lua scripts.

Sample Schema

I will be using the users table shown below to demonstrate the above use cases in PeerDB. This table includes various fields relevant to our testing scenarios."

CREATE TABLE users ( id SERIAL PRIMARY KEY , first_name VARCHAR ( 255 ), last_name VARCHAR ( 255 ), ssn CHAR ( 11 ), payload JSONB, salary_in_usd NUMERIC ( 10 , 2 ) );

... continue reading