Skip to content
Tech News
← Back to articles

Distributed DuckDB Instance

read original get DuckDB In-Memory Database → more articles
Why This Matters

OpenDuck democratizes advanced cloud database architecture by providing an open-source platform that enables seamless, hybrid, and differential storage solutions for DuckDB. This innovation allows developers and organizations to build scalable, efficient, and transparent remote database integrations, fostering greater flexibility and control in data management. Its open protocol and extensibility promise to accelerate innovation and adoption in cloud-native database applications.

Key Takeaways

OpenDuck

An open-source implementation of the ideas pioneered by MotherDuck — differential storage, hybrid (dual) execution, and transparent remote databases for DuckDB — available for anyone to run, extend, and build on.

MotherDuck showed that DuckDB can work beautifully in the cloud: ATTACH 'md:mydb' , and remote tables appear local. Queries split transparently across your laptop and the cloud. Storage is layered and snapshot-based. OpenDuck takes those architectural ideas — differential storage, dual execution, the attach-based UX — and makes them open. Open protocol, open backend, open extension.

import duckdb con = duckdb . connect () con . execute ( "LOAD 'openduck';" ) con . execute ( "ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;" ) con . sql ( "SELECT * FROM cloud.users" ). show () # remote, transparent con . sql ( "SELECT * FROM local.t JOIN cloud.t2 ON ..." ). show () # hybrid, one query # direct connect using openduck python library con = openduck . connect ( "od:mydb" ) con = openduck . connect ( "openduck:myd" ) # direct connect using duckb (TODO: needs duckdb to autoload openduck the same way motherduck works today) con = duckdb . connect ( "od:mydb" ) con = duckdb . connect ( "openduck:myd" )

What OpenDuck does

Differential storage

Append-only layers with PostgreSQL metadata. DuckDB sees a normal file; OpenDuck persists data as immutable sealed layers addressable from object storage. Snapshots give you consistent reads. One serialized write path, many concurrent readers.

Hybrid (dual) execution

A single query can run partly on your machine and partly on a remote worker. The gateway splits the plan, labels each operator LOCAL or REMOTE , and inserts bridge operators at the boundaries. Only intermediate results cross the wire.

[LOCAL] HashJoin(l.id = r.id) [LOCAL] Scan(products) ← your laptop [LOCAL] Bridge(R→L) [REMOTE] Scan(sales) ← remote worker

... continue reading