Skip to content
Tech News
← Back to articles

Reverting the incremental GC in Python 3.14 and 3.15

read original get Python Garbage Collection Book → more articles

Neil Schemenauer: Neil Schemenauer: It’s not real world, it’s synthetic. And it just creates a lot of cycles. Getting better real-world like benchmarks or more reports from people testing it for real apps would be nice.

A seemingly eternal problem. Not optimistic. It’s why I said, in a different topic, that the best we can hope for is to stumble into new heuristics to worm around the real-world pathologies that pop up from time to time. Historically, almost all such cases I dealt with in sorting and pymalloc showed up first on Stack Overflow, with “random” users asking for help with “inexplicable slowdowns”.

When I was writing Python’s current sort, the only reports I got of timing results on various platforms came from people running the brief, synthetic, sortperf.py , which was part of the standard distribution at the time. Only one set of results from a real app running real data. They shared the result (“2x faster!”), but could not share their company’s data.

Quite recently I all but begged users to report just timing results (no data required) for a proposed change to the sorting algorithm. Your reply was the only one I got - and thank you for that .

Similar story for judging the string of collision resolution strategies the dict implementation has tried. Overwhelmingly driven by synthetic inputs, and all but the current strategy (which hasn’t changed in years) were eventually discarded for catastrophic behavior on rare reports from real-life apps. But in that case, it’s provable that “catastrophic” collections of keys always exist - the question is the much subtler one of “but how likely is real-life data to stumble into one?”.

Not unique to Python. Sebastian Wild. an academic who co-created the terrific “powersort” merge-ordering heuristic, has reported the extremely poor response to his persistent pleas for “real world data”. Mostly he just attracts contrived bad cases.

The most interesting case recently was a Sphinx slowdown. That had the two key features: creating a significant number of container objects and having at least some of those contain reference cycles. That slowdown was resolved.

Whereas I early on opened an issue about seemingly quadratic-time behavior on the main branch. I had no idea gc changes were to blame at the time, and the whittled test case created essentially no cycles. It just provoked the heuristics at the time to run parts of gc far more often than reasonable.

Much of sorting has quite predictable worst-case O() behavior, but the gc context is much messier than that.

In the absence of realistic benchmarks and real-world reports, I think an extensive set of synthetic benchmarks would be helpful.

... continue reading