Press enter or click to view image in full size Photo by Egor Komarov on Unsplash
A local-first, reversible PII scrubber for AI workflows using ONNX and Regex Tom Jordi Ruesch 4 min read · 4 hours ago 4 hours ago -- Listen Share
The Privacy-Translation Paradox
Every engineering team eventually faces the same dilemma: You need to translate user content (support tickets, documents, chat logs) using high-quality engines like DeepL or LLMs like GPT-4, but you strictly cannot send Personally Identifiable Information (PII) to third-party APIs (yes, I’m European).
The solution is seemingly simple: Redact the data. The problem? Redaction destroys translation quality.
If you scrub “John bought a generic gift for Mary” into “PERSON bought a generic gift for PERSON,” the translation engine loses the context needed for grammatical gender agreement, case endings, and prepositions in target languages like French or German. Furthermore, most open-source PII scrubbers are “one-way” — they clean data for analytics, not for a round-trip translation workflow.
At ELAN Languages, I built a solution for this. Today, we are open-sourcing Bridge Anonymization: a TypeScript library for reversible, context-aware PII masking designed specifically for translation pipelines.
How bridge-anonymization Works
Unlike general-purpose scrubbers, Bridge is designed around a lifecycle:
Detect -> Mask -> Translate -> Rehydrate
... continue reading