Every year, someone posts a benchmark showing Python is 100x slower than C. The same argument plays out: one side says "benchmarks don't matter, real apps are I/O bound," the other says "just use a real language." Both are wrong.
I took two of the most-cited Benchmarks Game problems -- n-body and spectral-norm -- reproduced them on my machine, and ran every optimization tool I could find. Then I added a third benchmark -- a JSON event pipeline -- to test something closer to real-world code.
Same problems, same Apple M4 Pro, real numbers. This is one developer's journey up the ladder -- not a definitive ranking. A dedicated expert could squeeze more out of any of these tools. The full code is at faster-python-bench.
Here's the starting point -- CPython 3.13 on the official Benchmarks Game run:
Benchmark C gcc CPython 3.13 Ratio n-body (50M) 2.1s 372s 177x spectral-norm (5500) 0.4s 350s 875x fannkuch-redux (12) 2.1s 311s 145x mandelbrot (16000) 1.3s 183s 142x binary-trees (21) 1.6s 33s 21x
The question isn't whether Python is slow at computation. It is. The question is how much effort each fix costs and how far it gets you. That's the ladder.
Why Python Is Slow
The usual suspects are the GIL, interpretation, and dynamic typing. All three matter, but none of them is the real story. The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize.
A C compiler sees a + b between two integers and emits one CPU instruction. The Python VM sees a + b and has to ask: what is a ? What is b ? Does a.__add__ exist? Has it been replaced since the last call? Is a actually a subclass of int that overrides __add__ ? Every operation goes through this dispatch because the language guarantees you can change anything at any time.
The object overhead is where this shows up concretely. In C, an integer is 4 bytes on the stack. In Python:
... continue reading