Unix "find" expressions compiled to bytecode

December 23, 2025

nullprogram.com/blog/2025/12/23/

In preparation for a future project, I was thinking about at the unix find utility. It operates a file system hierarchies, with basic operations selected and filtered using a specialized expression language. Users compose operations using unary and binary operators, grouping with parentheses for precedence. find may apply the expression to a great many files, so compiling it into a bytecode, resolving as much as possible ahead of time, and minimizing the per-element work, seems like a prudent implementation strategy. With some thought, I worked out a technique to do so, which was simpler than I expected, and I’m pleased with the results. I was later surprised all the real world find implementations I examined use tree-walk interpreters instead. This article describes how my compiler works, with a runnable example, and lists ideas for improvements.

For a quick overview, the syntax looks like this:

$ find [-H|-L] path... [expression...]

Technically at least one path is required, but most implementations imply . when none are provided. If no expression is supplied, the default is -print , e.g. print everything under each listed path. This prints the whole tree, including directories, under the current directory:

$ find .

To only print files, we could use -type f :

$ find . -type f -a -print

Where -a is the logical AND binary operator. -print always evaluates to true. It’s never necessary to write -a , and adjacent operations are implicitly joined with -a . We can keep chaining them, such as finding all executable files:

... continue reading