What happens if I just point a git server at an object storage bucket?
Back when I was porting agent sandboxes to Go, I built everything on top of billy, a filesystem abstraction for Go. The whole trick of the project was teaching a Tigris bucket to act enough like a filesystem that a shell interpreter and its tools couldn’t tell the difference. Billy was the key layer that made the entire façade fall into place.
After I had gotten things working, I learned that I’m using billy way outside its normal usecase. It was originally made for go-git, a pure-Go implementation of git’s protocols and data formats. It doesn’t rely on the /usr/bin/git binary existing at all. Every method on billy’s filesystem interface exists purely because go-git needs it. This gave me a terrible idea: I already have a bucket that can quack like a filesystem and go-git’s native language is “filesystem”.
Can this Just Work™? Let's find out.
If you strip away the porcelain, a git repository is 4 basic things:
Objects, or compressed blobs of data. Most of the objects in any individual repository are files.
Trees, or objects that map to other objects. TL;DR: trees are folders.
Commits, or objects that point at one tree and their parent commit. This lets you pin down which files belong to one logical change set.
Refs, branches and tags, they are tiny mutable pointers into the pile of objects.
note Until I started working on this I was under the impression that git stored only the patches done to an empty folder and that was how it reconstructed the history of your repository. It does not. It actually keeps track of the entire files, which explains why big binary blobs fudge the tooling so much. The diff mental model works fine for using git day to day; it’s just wrong at the storage layer, which is the layer this post lives in.
... continue reading