Keyhive Local-first access control 🐝 Keyhive is a project exploring local-first access control. It aims to provide a firm basis for secure collaboration, similar to the guarantees of private chat but for any local-first application. In this lab notebook, we’ll share snippets of our findings as we explore the problem space and prototype potential solutions. The entries start from the beginning, but you can jump to the most recent post: 05 · Syncing Keyhive
00 · Keyhive Background As the local-first ecosystem matures, the contexts that local-first applications fill has also expanded. Local-first emphasizes collaboration, but the constraints on an application are different if you build an application for you and a handful of friends versus delivering a team-oriented product. Your data not being viewable or editable by everyone in the world is a basic requirement of applications ranging from planning a surprise party, corporate meeting notes, book drafts, and legal contracts. Today’s most common access control patterns assume a central server. While cloud auth tools are forever developing, generally speaking existing tools for cloud auth are very mature. Doing access control without a cloud auth server requires rethinking the underlying mechanics of how auth works. Keyhive is an attempt to do secure and efficient local-first auth while retaining the user experience found in familiar applications like Google Docs, Dropbox, GitHub, and Discord. We believe that these are table stakes for the next generation of local-first applications. A GitHub repository permissions page We’ve seen user-agency principals successfully applied to other contexts. Signal popularized end-to-end encrypted chat while retaining much of the convenience of less-secure messaging applications. We find ourselves asking “what would Signal for documents look like?” Least Surprise Unlike a cloud auth system which can depend on the network to keep data hidden behind a web API, local-first runs a complete copy of the application at each replica. What are the correct bounds on access control when everyone has direct access to all of the content? Ultimately access control is about collaboration. Collaboration and access control can be seen as two sides of the same topic: who do you want to collaborate with, in which ways, and for how long? CRDTs try to merge data in the least surprising way possible. For example, concurrent text will merge to produce the same data on all replicas, but the resulting paragraphs may not make sense next to each other. Users then fix these semantic errors manually. We believe that this is a major improvement over the user experience of something like Git, which often gets stuck and demands user intervention. The equivalent situation exists for concurrent access control, but the stakes are higher. Preventing your friend from learning that you’re planning a surprise party, or opposing legal councel from altering your case prep are both important, and it should be clear how they will behave despite any underlying concurrency. The behavior of an access control system should be as clear to the end user as possible. Since there is no single source of truth about who can do what at any given time, the rules themselves need to be straightforward. Ranges of authorization (and revocation) over time. Here 🙋♀️ is added, removed, and re-added later. Some of 🙋♀️’s updates are not materialized based on where they’re ordered in the document history. Out of Obscurity Often local-first applications today depend on “security through obscurity”. For example, by default you can write into any Automerge document that you know the document ID for. This style is sometimes called “Swiss number” or “Rumpelstiltskin” security. It works as long as the document ID is only ever shared with people that you want to collaborate with, your security is all-or-nothing, and you never want to later remove someone from a document. If the document ID leaks (e.g. someone posts it to Bluesky), then the document is world-writable. In lieu of a widely-adopted purely local-first access control system, some teams have tried leveraging existing auth methods by routing updates through a cloud auth server (e.g. OAuth login and auth logic in a server). Others have opted to emphasize decentralized user agency by using a blockchain to store access control policies. Both of these approaches require a network connection in order to check if an update is valid, which is not local-first. Bringing access control features to a local-first context requires rethinking how authority flows between nodes. What we want is a system that retains the best of the above: the self-certification of Rumpelstiltskin, the power of auth servers, and the user agency of decentralized solutions. Following the definition of local-first, applications should accept updates after arbitrarily long periods of disconnection. Extending that requirement to access control means the ability to revoke access or have finer grained control (e.g. read vs write) requires tracking who has authorization to do what, and at which point in the document’s history.
01 · Welcome to the Keyhive Today’s cloud services have very mature access control features. These systems depend on a key architectural detail: they are able to rely on encapsulation by taking advantage of the network boundary. Since data is not available to read or write directly by the client, a privileged guard process is able to apply arbitrary access control rules. This process retrieves and/or mutates data on behalf of clients. This power unfortunately comes at a price: since auth is on the hot path of every request — and generally depends on a central single-source-of-truth auth database — authorization at scale often bottlenecks overall application performance. And yet, an attacker that is able to bypass the auth part of the request lifecycle has unmitigated access to arbitrarily read, change, or delete the application’s data. This is to say nothing of the complexity of building, deploying, and maintaining cloud architectures to get that network boundary in the first place! For local-first software to be successful in many production contexts, it needs to provide similar features without relying on a central authorization server. The local-first setting does not have the luxury of a network boundary: access control must travel with the data itself and work without a central guard. There are also some tricky edge cases due to causal consistency. What should happen to honest operations that causally depend on content that’s later discovered to be malicious? What is the best strategy to handle operations from an agent that was revoked concurrently, especially given that “back-dating” operations is always possible. If a document has exactly two admins (and many non-admin users), what should happen if the admins concurrently revoke each other (for instance, one is malicious)? A solution to backdating is to gain consensus on which operations happened up to a certain point, and drop all concurrent changes that arrive later. This is fairly counter to the local-first ethos, hence us avoiding it here in order to preserve the same consistency level as Automerge. There may be situations where consensus is appropriate, but we currently consider it a last resort. To address the above challenges, we’ve started work on Keyhive: a project focused on local-first access control. Our goal is to design and build a production ready instance of such a system which is general enough for most local-first applications. Audience & Application To date, the local-first ecosystem has primarily used a purely pull-based model where users manually decide which changes to accept. This approach is often sufficient for personal projects: each user can manually decide which peers to connect to and which changes should be applied. On the other hand, many production contexts have lower trust, require higher alignment, and are ideally low touch enough so that it’s not up to each person in a large organization to separately and manually infer who to trust. As a rough north star, we’re keeping the following use cases in mind: Publishing (publicly visible data with restricted edits, like a blog)
Planning a surprise party: small groups, low risk
Meeting notes: small-to-medium groups, low-to-medium risk
Corporate legal documents: medium-to-large groups, medium-to-high risk
Journalists & activists: small-to-medium groups, high risk Cryptography has a reputation for being slow, especially if there’s crypto-heavy code running on low-powered devices. To have a performance margin that can cover a large range of practical use cases, Keyhive aims to run efficiently over at least ten-of-thousands of documents, millions of readers, thousands of writers, and hundreds of admins/superusers. Antigoals Since authorization, authentication, and identity are often conflated, it is worth highlighting that Keyhive deliberately excludes user identity (i.e. the binding of a human identity to an application’s identifier like a public key). In our initial community consultations we found that there are many different identity mechanisms that developers downstream of Keyhive would like to use. As such, we’re designing the system to be decentralized and secure, and leave name registration/discovery and user verification (e.g. email or social) to a future layer above Keyhive. The following are left out of our design goals: Constraining downstream applications to use a small predefined set of policies or roles
Interactive protocols (since local-first must work under network partition)
Reliance on a central authority
Cryptographic agility
... continue reading