Skip to content
Tech News
← Back to articles

Show HN: Pseudonymizing sensitive data for LLMs without losing context

read original get Data Privacy Toolkit → more articles
Why This Matters

This development highlights a crucial advancement in privacy-preserving AI, enabling the use of powerful language models without exposing sensitive user data. It addresses the industry's growing need for secure, compliant AI integrations, especially in sensitive fields like cybersecurity and enterprise management.

Key Takeaways

We have been building a Ghost Analyst on top of Anthropic’s Claude to triage Microsoft Sentinel and Defender incidents. The flow is straightforward:

An alert fires.

The agent pulls the relevant Entra ID logs.

The agent writes the KQL queries it needs.

An analyst gets a clean triage report on the other side.

The catch is that triage data contains client IPs, usernames, internal hostnames and corporate domains. Sending all of that to a cloud model is not something we want to do without a filter in front of it. Running a local model would solve the privacy problem, but no open-source model we tested came close to Claude Opus on this kind of reasoning. We needed a middle ground: keep using a frontier model, keep client data out of it.

So we built a Data Loss Prevention layer.

Our approach

The proxy sits between the agent and the Anthropic API:

It pseudonymizes sensitive data on the way out.

... continue reading