Skip to content
Tech News
← Back to articles

Python: The Optimization Ladder

read original get Python Programming Book → more articles
Why This Matters

This article highlights the challenges and potential of optimizing Python performance, emphasizing that Python's dynamic design inherently limits how much speed can be gained through traditional optimization techniques. For the tech industry and consumers, understanding these limitations is crucial for making informed decisions about language choice and performance expectations in real-world applications.

Key Takeaways

Every year, someone posts a benchmark showing Python is 100x slower than C. The same argument plays out: one side says "benchmarks don't matter, real apps are I/O bound," the other says "just use a real language." Both are wrong.

I took two of the most-cited Benchmarks Game problems -- n-body and spectral-norm -- reproduced them on my machine, and ran every optimization tool I could find. Then I added a third benchmark -- a JSON event pipeline -- to test something closer to real-world code.

Same problems, same Apple M4 Pro, real numbers. This is one developer's journey up the ladder -- not a definitive ranking. A dedicated expert could squeeze more out of any of these tools. The full code is at faster-python-bench.

Here's the starting point -- CPython 3.13 on the official Benchmarks Game run:

Benchmark C gcc CPython 3.13 Ratio n-body (50M) 2.1s 372s 177x spectral-norm (5500) 0.4s 350s 875x fannkuch-redux (12) 2.1s 311s 145x mandelbrot (16000) 1.3s 183s 142x binary-trees (21) 1.6s 33s 21x

The question isn't whether Python is slow at computation. It is. The question is how much effort each fix costs and how far it gets you. That's the ladder.

Why Python Is Slow

The usual suspects are the GIL, interpretation, and dynamic typing. All three matter, but none of them is the real story. The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize.

A C compiler sees a + b between two integers and emits one CPU instruction. The Python VM sees a + b and has to ask: what is a ? What is b ? Does a.__add__ exist? Has it been replaced since the last call? Is a actually a subclass of int that overrides __add__ ? Every operation goes through this dispatch because the language guarantees you can change anything at any time.

The object overhead is where this shows up concretely. In C, an integer is 4 bytes on the stack. In Python:

... continue reading