According to a classic joke attributed to Phil Karlton , computer science has only two hard problems: naming things and cache invalidation. They are hard because no algorithm can solve them: good names come from empathy and understanding, and cache invalidation requires system thinking and careful analysis. This article describes another such problem, pervasive and insidious yet rarely noticed: mapping a general graph onto a hierarchical structure. I call it tree mapping.
Our brains evolved to be remarkably good at dealing with physical space. We can intuitively orient ourselves in an unfamiliar city and draw an accurate map of a place we visited decades ago. Naturally, we want to take advantage of this machinery in all aspects of our lives.
One of the defining features of a physical space is its hierarchical, localized structure. We perceive each bit of space as self-contained and interacting only with nearby bits, which enables a hierarchical view of the world in which objects form collections that can be understood as a whole: particles form atoms, atoms form molecules, molecules form bodies, and so forth, up to galaxies and superclusters.
Hierarchies are so natural to us that they have become our dominant organizing tool. We sort things and ideas into labeled boxes and put these into larger boxes. It works for physical objects that can be in only one place at a time. Ideas and information, however, resist taxonomies. They form intricate webs that penetrate rigid boundaries.
Trees formalize hierarchies; they are among the most widely used structures in computer science. Trees are intimately related to spaces as they are universal space organizers: B-trees organize ordered key spaces in databases, k-d trees slice multi-dimensional spaces in graphics, and abstract syntax trees organize linear token sequences in compilers.
⊕ Parsing is a space organization problem as it requires labeling token sequences with their syntactic roles.
But with all their utility, trees can’t model webs directly. Thus, tree mapping is a problem that requires embedding a web into a tree in a way that distorts the web structure.
Let’s look at a few examples.
The digital world thereby allows us to transcend the most fundamental rule of ordering the real world: Instead of everything having its place, it’s better if things can get assigned multiple places simultaneously. David Weinberger, Everything Is Miscellaneous
Imagine receiving a bill from your dentist. How do you file it? In a common archive folder? In a more specific "medical" folder? In the XXXX year taxes project folder for future tax returns? Or copy it and choose all the options at once?
... continue reading