Intuiting Pratt Parsing

Intuiting Pratt parsing

You already know that a + b * c + d is calculated as a + (b * c) + d . But how do you encode that knowledge precisely enough for a machine to act on it?

The most common solution employed by compilers is to make use of a tree known as an abstract syntax tree. In an AST, each operator sits above its operands, and evaluation works bottom-up: resolve the children, then apply the operation.

+ / \ + d / \ a * / \ b c

This tree encodes the desired ordering (a + (b * c)) + d in a format that is very convenient to work with programmatically.

But of course, people (for the most part) don’t write programs as trees. This means we face the problem of deriving this structure from flat text.

This is known as parsing. It has been the focus of decades of computer science research. It has also, in many cases, been wildly overcomplicated.

Simplifying

The difficulty in parsing lies with mixed precedence. To be precise, cases where precedence changes direction. Let’s imagine our users only ever wrote programs of either increasing or decreasing precedence. What would that mean for our tree representation?

In the case of decreasing precedence, we repeatedly evaluate the leftmost operator as it is higher precedence - multiplication before addition, addition before comparison, and so on. The first operator sits deepest in the tree, the last the shallowest. The resulting tree is left-leaning.

... continue reading