CRDT and SQLite: Local-First Value Synchronization

In my previous article, I provided a broad explanation of the algorithms that allow local-first applications to synchronize data correctly using CRDTs (Conflict-free Replicated Data Types). When you see a local-first app that seamlessly syncs with other applications distributed worldwide, it feels like magic. Under the hood, a lot is happening, and it has to happen in the right order for the algorithm to preserve correctness.. The source code of this simple iOS and Android ToDo app can be found at: https://github.com/sqliteai/sqlite-sync/tree/main/examples/to-do-app Let’s look more deeply at what really happens when a value in a local-first app (backed by SQLite) is INSERTed, UPDATEd, or DELETEd. Initial setup Suppose we have two users, Bob and Alice, each with a local SQLite database named todo.sqlite that contains the following table: CREATE TABLE todo ( id TEXT PRIMARY KEY NOT NULL, title TEXT, status TEXT ); Initially, both databases are empty, and at some point, Alice adds a "Buy groceries" item to the todo table with status set to "in_progress". In SQL it can be translated to: INSERT INTO todo (id, title, status) VALUES ('ID1', 'Buy groceries', 'in_progress'); At the SQL level, this is just a row insert. But for the sync engine, this single INSERT is not enough information to replicate the change safely across peers. It needs to break the change apart and record causality. How the sync engine intercepts the change Most local-first systems hook into SQLite by setting up UPDATE / INSERT/DELETE triggers on the user table; this is usually the trick that enables the engine to know that Alice’s device just inserted a new row. Once the engine is notified, it doesn’t just say “a row was added.” It splits the mutation into per-column events. Why? Because CRDTs typically track each column independently so that two peers can edit different columns of the same row without overwriting each other. For our INSERT , the engine conceptually turns it into three “column insertions”: (row=ID1, column=title, value=’Buy groceries’) (row=ID1, column=status, value=’in_progress’) (row=ID1, column=id, value=’ID1’) (implicit for primary key) Each column is then annotated with causal metadata. Metadata captured per column For every column change, the engine typically stores: site_id : Unique id of the device/database (e.g. Alice’s phone). column_name : Which column changed ( title , status , etc.). row_key : Encoded primary key (points back to the row in todo ). column_version : A Lamport clock or hybrid logical clock (HLC) that totally orders changes from the same site. db_version : The current database-wide Lamport clock , used to detect if a peer has already seen this change. op_type : Flag describing if this is an insert/update/delete at the column level (some systems call this a “tombstone” or “causal length”). seq: Order of mutations within the same db_version (to preserve local transaction order). In many implementations, this is stored in a hidden metadata_table, so Alice’s insert produces one entry per column with her site_id and incremented clocks. Why we need column-level clocks CRDT algorithms for documents, sets, and maps use per-field timestamps so that: If Bob changes status while Alice changes title , they can merge without conflict. If both change the same column, the highest clock (or other CRDT rule) wins. This is what lets local-first apps avoid last-writer-wins bugs while still syncing automatically. Sync: how Bob receives Alice’s change When Alice later syncs with Bob (via any transport like HTTP, P2P, etc.): Diff calculation: P2P mode: Alice asks Bob: “What db_version have you seen from me?” Bob replies with the last db_version he knows for Alice’s site_id . Alice then ships only the ops with db_version > Bob’s reply. Server-orchestrated mode: Instead of talking directly, Alice contacts a sync service . The server keeps a version map (per- site_id last-seen db_version ) for every client. Alice uploads her new ops; the server updates Alice’s entry. When Bob syncs, the server tells him which db_version s from each site he’s missing and streams just those ops. This centralizes fan-out , simplifies conflict-free catch-up , and avoids NAT/P2P issues while preserving CRDT semantics. Operation shipping: Alice sends all new rows in metadata_table beyond that version. Replay: Bob’s engine replays the ops in causal order: If the row doesn’t exist, it creates it. It writes the columns in the same order Alice used ( seq ). It updates Bob’s local clock table so he doesn’t reapply the same ops later. After the replay, Bob’s user-facing todo table looks exactly like Alice’s. UPDATE under the hood If Alice later runs: UPDATE todo SET status = ‘done’ WHERE id = ‘ID1’; SQLite writes to the status cell. The engine captures the event (via trigger or hook). It generates a new column_version for status . Inserts a new op record into metadata_table with op_type = UPDATE . The old value stays in todo ; CRDT history doesn’t store the previous value, just the new value + causal version. When Bob syncs, his engine sees: “ status column of ID1 is now ‘done’ with version (Alice, 42) ” If Bob’s local version for status is lower, it overwrites; if higher, it ignores (or merges, depending on CRDT type). DELETE under the hood Deletes are usually implemented as a tombstone, not a physical row removal: DELETE FROM todo WHERE id = ‘ID1’; Instead of dropping the row immediately, the engine writes an op with op_type = DELETE (tombstone). It may also store a delete clock per column or per row. When Bob syncs, his engine: Marks the row as deleted if Alice’s delete clock > any of Bob’s column clocks. Some systems keep the tombstone forever; others GC it once all peers have seen it. This ensures that if Bob was offline and later tries to update the deleted row, the system can detect and discard that update (since the delete happened causally later). Putting it together So a single user action like: INSERT INTO todo VALUES (’ID1’, ‘Buy groceries’, ‘in_progress1’); Triggers: SQLite row write. Engine hook intercept. Mutation split into per-column ops. Metadata: (site_id, row_key, column_name, column_version, db_version, op_type, seq) . Ops appended to a local CRDT log. Later: network sync ships the ops. Remote site replays ops in causal order, resolving conflicts column by column. This mechanism — column-wise causality + Lamport clocks + tombstones + metadata_table shipping is what lets local-first apps built on SQLite remain offline-first, merge safely, and stay reactive. Here’s a polished section you can insert into your article (it fits naturally after the Sync Phase section): Working Offline — and Why It Still Works One of the most powerful aspects of a local-first architecture is that Alice and Bob can be offline for as long as they want and everything will still sync correctly later: Alice might go for a long train ride, shut down her phone, and come back online hours later. Bob could fly overseas with his phone in airplane mode for the entire trip. While they’re disconnected, both can freely INSERT, UPDATE, and DELETE items in their local SQLite database. Each change is recorded in the hidden metadata_table with its site_id , per-column version, and db_version . When the network becomes available again, the sync layer simply asks: “What operations have I not yet seen from each site?” It then ships and merges all missing ops using the causal metadata. Because the system is based on CRDTs (Conflict-free Replicated Data Types), it provides what is called strong eventual consistency: Every replica eventually sees the same set of operations. Conflicting edits to different columns merge automatically. True conflicts (same column, same row) are resolved deterministically using the clocks. The result: no matter how long someone stays offline or how messy the network conditions are, the databases will converge without human intervention or manual conflict resolution. I’m the founder of SQLite AI, our mission is to make local-first sync with SQLite and AI on the Edge straightforward and reliable. If you’d like to experiment with a production-ready sync engine, the same kind of technology discussed in this article, you can explore it at: https://www.sqlite.ai/sqlite-sync Share

CRDT and SQLite: Local-First Value Synchronization

Share this article

Related Articles