Garbage Collection Is Useful

A long time ago, I spent a few years working on garbage collection in the J9 Java VM. And even though I’ve since done mostly done higher-level stuff, having a deeper knowledge of GC has continued to come in useful.

Yesterday, it was an insight from one of my favourite GC papers, A Unified Theory of Garbage Collection, which helped me solve a tricky problem.

ohm’s incremental parsing

I’m working with a team that’s using Ohm to parse text documents and render a rich text version in ProseMirror. The goal is bidirectional updates: changes in ProseMirror should propagate to the text version, and vice versa.

Ohm supports incremental parsing, which means that if you parse some text and then make a small edit, it can quickly reparse by reusing portions of the previous result.

It also supports limited form of incremental transforms. You can define an attribute, which is kind of like a memoized visitor, and the attribute value for a given node will only need to be recalculated if the edit affected one of its subtrees. So you can easily implement a form of persistent data structure, where each new value (e.g., an AST) shares a bunch of structure with the previous one.

the problem

Using this machinery, I tried making an pmNodes attribute that produced a ProseMirror document for a given input. When the text document is edited, it produces a new tree which shares a bunch of nodes with the previous one.

The `pmNodes` tree before and after an edit.

Then, my plan was to construct a ProseMirror transaction that would turn the old tree into the new one. To do that, it’s helpful to know which nodes appeared in the old document, but not the new one.

... continue reading