Making ast.walk 220x Faster

In our AI reflex-app builder we generate massive amounts of Python code. Sometimes, this code generation fails in rather trivial manners; positional parameters after keyword ones, returns with values in async generators, using outdated syntax conventions from previous versions of our framework, etc.

Running reflex compile will eventually find all of those bugs, but it only finds one issue at a time. That means if the AI made multiple mistakes, we are increasing the latency massively for what could be relatively simple fixes.

As such, we decided using a linter would be the best approach to fix this. And since we need to add reflex-specific rules, we couldn't use an existing one, and had to build our own.

Our initial linter looked simple:

Unfortunately, this approach ran into performance problems quickly, as we are processing a lot of generated code.

Notably, the slowest part in the above code is not the isinstance checks, it is ast.walk . Of course, there are many ways of optimizing things in ways that aren't making ast.walk faster, and we implemented those "lower-hanging-fruit" changes first. However, we quickly realized it was hard to make the linter significantly faster without taking on the challenge of making walk itself faster.

However, walking an abstract tree (which for the purposes of this code, is just a regular tree) doesn't have to be slow. Walking the difflib module took ~2ms on my device, for ~7,000 nodes. On its own, this isn't horrible, but it quickly adds up. A rough calculation gives us ~285 nanoseconds per node, on the order of a thousand CPU cycles - far more than such a simple traversal should need. So what on earth is ast.walk doing?

First thing to note here is the use of yield . Generators and yield syntax are a powerful feature of Python, but they come at a sharp cost: suspending the execution of the loop and unsuspending it repeatedly in a hot path where we are consuming the full list anyways. Sure, it saves memory, but that's not our problem at the moment, especially if that list would get cleaned up. If we simply store a list, and keep appending to it, we can minimize this:

But after running it, I realized it only gets us, 5% improvement. Much less than I expected, and it makes me wonder: what is iter_child_nodes doing?

Ah, another generator. If we inline it, we should see some decent performance improvements:

... continue reading