Python @ HRT
At Hudson River Trading (HRT), we’ve found that centralizing our codebase facilitates cross-team collaboration and rapid deployment of new projects. Therefore, the majority of our software development takes place in a monorepo, and our Python ecosystem is set up such that internal modules are importable everywhere. Unfortunately, the convenience of this arrangement has led to a conundrum: a vast proliferation of imports.
In Python, imports occur at runtime. For each imported name, the interpreter must find, load, and evaluate the contents of a corresponding module. This process gets dramatically slower for large modules, modules on distributed file systems, modules with slow side-effects (code that runs during evaluation), modules with many transitive imports, and C/C++ extension modules with many library dependencies.
Most of our internal modules fall into one or more of these categories. Thus, as the sheer number of imports has increased, so too has their cumulative runtime overhead. For users, this has surfaced as scripts starting tens of seconds later, notebooks taking minutes to load, and even the simplest distributed jobs spending a substantial portion of their runtime on imports.
And yet, most of our scripts and modules only refer to a few of the names defined by the modules they import. Applied recursively, this suggests that only a fraction of our imports are actually used at runtime. Is there some way we can avoid paying the cost of everything else?
Lazy Imports
Lazy imports are a feature we borrowed from Cinder, Meta’s performance-oriented fork of CPython. The idea is to defer the resolution and evaluation of imported modules until they’re referenced, entirely bypassing imports that are never actually used at runtime. Cinder’s implementation of lazy imports relies on two core modifications:
1. Instead of immediately executing import statements, the interpreter assigns each imported name to a placeholder LazyImport object. This object persists everything needed to resolve and evaluate the import later on, i.e. the requisite module name and an optional attribute accessed by statements of the form from module import attribute .
2. When a LazyImport is retrieved from a dict (e.g. a name is referenced, incurring a globals() lookup) the interpreter completes the resolution and evaluation, transparently returning whatever was originally imported.
For example:
... continue reading