Tech News
← Back to articles

X-ray: a Python library for finding bad redactions in PDF documents

read original related products more articles

x-ray is a Python library for finding bad redactions in PDF documents.

At Free Law Project, we collect millions of PDFs. An ongoing problem is that people fail to properly redact things. Instead of doing it the right way, they just draw a black rectangle or a black highlight on top of black text and call it a day. Well, when that happens you just select the text under the rectangle, and you can read it again. Not great.

After witnessing this problem for years, we decided it would be good to figure out how common it is, so, with some help, we built this simple tool. You give the tool the path to a PDF. It tells you if it has worthless redactions in it.

What next?

Right now, x-ray works pretty well and we are using it to analyze documents in our collections. It could be better though. Bad redactions take many forms. See the issues tab for other examples we don't yet support. We'd love your help solving some of tougher cases.

Installation

With uv, do:

uv add x-ray

With pip, that'd be:

pip install x-ray

... continue reading