Tachyon: High frequency statistical sampling profiler

Source code: Lib/profiling/sampling/

The profiling.sampling module, named Tachyon, provides statistical profiling of Python programs through periodic stack sampling. Tachyon can run scripts directly or attach to any running Python process without requiring code changes or restarts. Because sampling occurs externally to the target process, overhead is virtually zero, making Tachyon suitable for both development and production environments.

Statistical profiling excels at answering the question, “Where is my program spending time?” It reveals hotspots and bottlenecks in production code where deterministic profiling overhead would be unacceptable. For exact call counts and complete call graphs, use profiling.tracing instead.

This external observation model is what makes sampling profiling practical for production use. The profiled program runs at full speed because there is no instrumentation code running inside it, and the target process is never stopped or paused during sampling—Tachyon reads the call stack directly from the process’s memory while it continues to run. You can attach to a live server, collect data, and detach without the application ever knowing it was observed. The trade-off is that very short-lived functions may be missed if they happen to complete between samples.

The key difference from profiling.tracing is how measurement happens. A tracing profiler instruments your code, recording every function call and return. This provides exact call counts and precise timing but adds overhead to every function call. A sampling profiler, by contrast, observes the program from outside at fixed intervals without modifying its execution. Think of the difference like this: tracing is like having someone follow you and write down every step you take, while sampling is like taking photographs every second and inferring your path from those snapshots.

When comparing two implementations where the difference might be only 1-2%, sampling noise can obscure real differences. Use timeit for micro-benchmarks or profiling.tracing for precise measurements.

When you need exact call counts, sampling cannot provide them. Sampling estimates frequency from snapshots, so if you need to know precisely how many times a function was called, use profiling.tracing .

For very short scripts that complete in under one second, the profiler may not collect enough samples for reliable results. Use profiling.tracing instead, or run the script in a loop to extend profiling time.

Statistical sampling is not ideal for every situation.

Because sampling is statistical, results will vary slightly between runs. A function showing 12% in one run might show 11% or 13% in the next. This is normal and expected. Focus on the overall pattern rather than exact percentages, and don’t worry about small variations between runs.

... continue reading