Skip to content
Tech News
← Back to articles

Matadisco – Decentralized Data Discovery

read original more articles
Why This Matters

Matadisco introduces a decentralized, open network for data discovery that enhances the accessibility and discoverability of vast datasets across various fields. By leveraging the AT Protocol, it allows producers to publish metadata and consumers to find relevant data without relying on centralized portals, fostering transparency and community-driven data sharing. This innovation addresses the fragmentation in data discovery, empowering researchers, developers, and organizations to build tailored portals and improve data visibility.

Key Takeaways

Matadisco An open, decentralized network for data discovery. Publish metadata about any dataset to AT Protocol. Build community portals. Find what matters. View on GitHub → Live demo

Open data is only as useful as it is discoverable Petabytes of satellite imagery, climate models, and genomic sequences sit in public repositories — yet finding the right data means navigating dozens of siloed portals, each with different interfaces, APIs, and blind spots. If you generate a derived dataset or clean up an existing one, there's often no way to make it findable. Government portals decide what gets published. Aggregators are centralized. Community contributions get lost.

How Matadisco works Matadisco separates data discovery from data storage. Three pieces work together: AT Protocol Matadisco is built on AT Protocol, an open social protocol. Every record is cryptographically signed. No single entity controls the network and all components are open source and can be self-hosted. Producers Write Matadisco records to a PDS (Personal Data Server). A record is a lightweight pointer to metadata — a link, an optional preview, and a timestamp — so the schema works with any metadata standard: STAC, DataCite, IIIF, RSS, and more. A producer typically watches an existing catalogue or data source and publishes records automatically. Consumers Read records from the network via a PDS or Jetstream, filter for what's relevant, and present them as a web-based portal for users. A satellite imagery portal, a scientific data hub, a cultural heritage archive — each built in about 100 lines of code.

The schema The Matadisco record is defined as an ATProto Lexicon. In MLF syntax: cx.vmx.matadisco record matadisco { publishedAt!: Datetime , resource!: Uri , preview: { mimeType!: string , url: Uri , }, } Only resource and publishdAt are required. The preview is optional — for satellite imagery it's a thumbnail, for articles a summary, for podcasts an audio snippet. Browse records · View published lexicon

See it in action The matadisco-viewer streams new ATProto records in real time and renders them. Currently showing Copernicus Sentinel-2 satellite imagery: Sentinel-2 L2A scene · metadata · full resolution (253 MiB)

Producers & Consumers Producers write records into the network; consumers read and display them. The prototype demonstrates both roles: sentinel-to-atproto (producer) — listens to Element 84's Earth Search STAC catalogue for new Sentinel-2 imagery and writes records to a PDS.

(producer) — listens to Element 84's Earth Search STAC catalogue for new Sentinel-2 imagery and writes records to a PDS. gdi-de-csw-to-atproto (producer) — imports metadata from the German geodata catalogue (GDI-DE) via CSW and publishes records to ATProto.

(producer) — imports metadata from the German geodata catalogue (GDI-DE) via CSW and publishes records to ATProto. matadisco-viewer (consumer) — subscribes to a Bluesky Jetstream relay or reads from a PDS, filters for Matadisco records, and displays them as a portal with previews.

(consumer) — subscribes to a Bluesky Jetstream relay or reads from a PDS, filters for Matadisco records, and displays them as a portal with previews. matadisco-geo-viewer (consumer) — a viewer specialised for geospatial metadata records with support for STAC metadata, rendering spatial previews on a map. Supports consumption from both Jetstream and PDS. Because records flow through an open network, institutions manage their catalogues independently while participating in shared discovery.

Prior art & influences FROST by Tom Nicholas — a Federated Registry of Scientific Things. His motivating essay on why science needs a social network for data is an excellent starting point.

... continue reading