Revisiting "Let's Build a Compiler"

There's an old compiler-building tutorial that has become part of the field's lore: the Let's Build a Compiler series by Jack Crenshaw (published between 1988 and 1995).

I ran into it in 2003 and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite often in Hacker News threads. Why is that? Why does a tutorial from 35 years ago, built in Pascal and emitting Motorola 68000 assembly - technologies that are virtually unknown for the new generation of programmers - hold sway over compiler enthusiasts? I've decided to find out.

The tutorial is easily available and readable online, but just re-reading it seemed insufficient. So I've decided on meticulously translating the compilers built in it to Python and emit a more modern target - WebAssembly. It was an enjoyable process and I want to share the outcome and some insights gained along the way.

The result is this code repository. Of particular interest is the TUTORIAL.md file, which describes how each part in the original tutorial is mapped to my code. So if you want to read the original tutorial but play with code you can actually easily try on your own, feel free to follow my path.

A sample To get a taste of the input language being compiled and the output my compiler generates, here's a sample program in the KISS language designed by Jack Crenshaw: var X=0 { sum from 0 to n-1 inclusive, and add to result } procedure addseq(n, ref result) var i, sum { 0 initialized } while i < n sum = sum + i i = i + 1 end result = result + sum end program testprog begin addseq(11, X) end . It's from part 13 of the tutorial, so it showcases procedures along with control constructs like the while loop, and passing parameters both by value and by reference. Here's the WASM text generated by my compiler for part 13: ( module ( memory 8 ) ;; Linear stack pointer. Used to pass parameters by ref. ;; Grows downwards (towards lower addresses). ( global $__sp ( mut i32 ) ( i32.const 65536 )) ( global $X ( mut i32 ) ( i32.const 0 )) ( func $ADDSEQ ( param $N i32 ) ( param $RESULT i32 ) ( local $I i32 ) ( local $SUM i32 ) loop $loop1 block $breakloop1 local.get $I local.get $N i32.lt_s i32.eqz br_if $breakloop1 local.get $SUM local.get $I i32.add local.set $SUM local.get $I i32.const 1 i32.add local.set $I br $loop1 end end local.get $RESULT local.get $RESULT i32.load local.get $SUM i32.add i32.store ) ( func $main ( export "main" ) ( result i32 ) i32.const 11 global.get $__sp ;; make space on stack i32.const 4 i32.sub global.set $__sp global.get $__sp global.get $X i32.store global.get $__sp ;; push address as parameter call $ADDSEQ ;; restore parameter X by ref global.get $__sp i32.load offset = 0 global.set $X ;; clean up stack for ref parameters global.get $__sp i32.const 4 i32.add global.set $__sp global.get $X ) ) You'll notice that there is some trickiness in the emitted code w.r.t. handling the by-reference parameter (my previous post deals with this issue in more detail). In general, though, the emitted code is inefficient - there is close to 0 optimization applied. Also, if you're very diligent you'll notice something odd about the global variable X - it seems to be implicitly returned by the generated main function. This is just a testing facility that makes my compiler easy to test. All the compilers are extensively tested - usually by running the generated WASM code and verifying expected results.

Insights - what makes this tutorial so special? While reading the original tutorial again, I had on opportunity to reminisce on what makes it so effective. Other than the very fluent and conversational writing style of Jack Crenshaw, I think it's a combination of two key factors: The tutorial builds a recursive-descent parser step by step, rather than giving a long preface on automata and table-based parser generators. When I first encountered it (in 2003), it was taken for granted that if you want to write a parser then lex + yacc are the way to go . Following the development of a simple and clean hand-written parser was a revelation that wholly changed my approach to the subject; subsequently, hand-written recursive-descent parsers have been my go-to approach for almost 20 years now. Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on. This was also a breath of fresh air for engineers who grew up with more traditional courses where you spend 90% of the time on parsing, type checking and other semantic analysis and often run entirely out of steam by the time code generation is taught. To be honest, I don't think either of these are a big problem with modern resources, but back in the day the tutorial clearly hit the right nerve with many people.

What else does it teach us? Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs. As I said above, this is a fantastic approach for getting started, but in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types, it becomes painfully obvious that it would be very nice if we knew the types of expressions before we generate code for them. I don't know if this is implicated in Jack Crenshaw's abandoning the tutorial at some point after part 14, but it may very well be. He keeps writing how the emitted code is clearly sub-optimal and can be improved, but IMHO it's just not that easy to improve using the syntax-directed translation strategy. With perfect hindsight vision, I would probably use Part 14 (types) as a turning point - emitting some kind of AST from the parser and then doing simple type checking and analysis on that AST prior to generating code from it.