A high-throughput parser for the Zig programming language
Published on: 2025-04-25 13:11:53
Accelerated Zig Parser
A high-throughput tokenizer and parser (soon™️) for the Zig programming language.
The mainline Zig tokenizer uses a deterministic finite state machine. Those are pretty good for some applications, but tokenizing can often employ the use of other techniques for added speed.
Two tokenizer implementations are provided.
A version that produces a few bitstrings per 64-byte chunk and uses those to skip over continuation-character matching. I gave two talks on this subject. (Currently this code has gone poof, but I will resurrect this for comparison's sake within 3 months (when I give my final Utah-Zig talk on the subject of the Zig Tokenizer in July)) A version that produces bitstrings for EVERYTHING we want to do within a 64-byte chunk, and utilizes vector compression to find the extents of all tokens simulataneously. See this animation. I also gave a talk (really more of a rant) about my grand plans here. Unfortunately it did not turn out how I had hoped because
... Read full article.