Coverage.py Regex Pragmas

Coverage.py uses regexes to define pragma syntax. This is surprisingly powerful.

Coverage.py lets you indicate code to exclude from measurement by adding comments to your Python files. But coverage implements them differently than other similar tools. Rather than having fixed syntax for these comments, they are defined using regexes that you can change or add to. This has been surprisingly powerful.

The basic behavior: coverage finds lines in your source files that match the regexes. These lines are excluded from measurement, that is, it’s OK if they aren’t executed. If a matched line is part of a multi-line statement the whole multi-line statement is excluded. If a matched line introduces a block of code the entire block is excluded.

At first, these regexes were just to make it easier to implement the basic “here’s the comment you use” behavior for pragma comments. But it also enabled pragma-less exclusions. You could decide (for example) that you didn’t care to test any __repr__ methods. By adding def __repr__ as an exclusion regex, all of those methods were automatically excluded from coverage measurement without having to add a comment to each one. Very nice.

Not only did this let people add custom exclusions in their projects, but it enabled third-party plugins that could configure regexes in other interesting ways:

covdefaults adds a bunch of default exclusions, and also platform- and version-specific comment syntaxes.

coverage-conditional-plugin gives you a way to create comment syntaxes for entire files, for whether other packages are installed, and so on.

Then about a year ago, Daniel Diniz contributed a change that amped up the power: regexes could match multi-line patterns. This sounds like not that large a change, but it enabled much more powerful exclusions. As a sign, it made it possible to support four different feature requests.

To make it work, Daniel changed the matching code. Originally, it was a loop over the lines in the source file, checking each line for a match against the regexes. The new code uses the entire source file as the target string, and loops over the matches against that text. Each match is converted into a set of line numbers and added to the results.

The power comes from being able to use one pattern to match many lines. For example, one of the four feature requests was how to exclude an entire file. With configurable multi-line regex patterns, you can do this yourself:

... continue reading