Show HN: Pseudonymizing sensitive data for LLMs without losing context

https://atticsecurity.com/en/blog/why-llms-hate-fake-data-token-proxy/

4•n00pn00p•5d ago

Comments

_zer0c00l_•5d ago

I have one (at least) fundamental concern about the approach - let's say I'm building an anti-fraud system that uses AI (through API), and maybe I'm asking AI whether my user totally+fraud@gmail.com is a potential fraudster. By masking this email address I'm sabotaging my own AI prompt - the AI cannot longer reason based on the facts that 1) the email is a free public email 2) the email says 'fraud' right in your face.

n00pn00p•5d ago

Valid point, the proxy has the option to always allow domain names through. You will lose some context always I fear. It should be used sparingly when you need a frontier model but also want to send sensitive data.

stuaxo•5d ago

You can do those as a sperate prompt.

dwa3592•5d ago

ooh nice. i built something exactly similar last year.

- https://github.com/deepanwadhwa/semi_private_chat

- https://github.com/deepanwadhwa/zink

n00pn00p•4d ago

oh sweet, that would have saved allot of time!

bennettdixon•4d ago

Nice write up, one thing that stood out is the V2 to V3 jump. One of my clients is integrating personal wellness & AI, and we took a slightly different route. The health data and personal data live in separate dbs with an encrypted mapping layer between. This way the model only sees health context attached to a unique pseudo-user level session. Your problem almost seems harder, because the PII is the signal/context. One challenge we are facing is re-identification, e.g rich-health profiles being identifiable in themselves.

Curious if you have thought about that side of things with your V3 implementation?

n00pn00p•4d ago

That's a great point. Because my tool is designed for security operations and triage, the context (like knowing an IP is from Hetzner, or a domain is a known burner) is actually the signal the LLM needs to do its job. I made a conscious trade-off to allow some contextual metadata to pass through to preserve utility.

Since I'm based in the Netherlands, I look at this strictly through the lens of the Dutch privacy law (the AVG). Under the AVG, there's a hard line between anonymized data and pseudonymized data. Because of the exact 'mosaic effect' you mentioned, pseudonymized data is legally still treated as personal data. So, the re-identification risk is an accepted reality.

Essentialy i treat the tool as an extra effort to reduce PII leaks. But its not foolproof against the context clues.

glitchnsec•3d ago

This is really cool - I'm still in V2 with NER for redacting PII before sending to model BUT that was just on simple email analysis. I bet most teams building for security with AI haven't addressed this! Thanks for sharing!

John Ternus to become Apple CEO

Anthropic says OpenClaw-style Claude CLI usage is allowed again

A Roblox cheat and one AI tool brought down Vercel's platform

Louis Zocchi, inventor of the d100, has died

The Beauty of Bonsai Styles

Salmon exposed to cocaine and its main byproduct roam more widely

How to make a fast dynamic language interpreter

Show HN: Mediator.ai – Using Nash bargaining and LLMs to systematize fairness

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

How a subsea cable is repaired

Kimi vendor verifier – verify accuracy of inference providers

Types and Neural Networks

A mad undertaking: An undefinitive guide to the Aadam Jacobs collection

Ternary Bonsai: Top Intelligence at 1.58 Bits

Jujutsu megamerges for fun and profit

Using Changesets in a polyglot monorepo

Air is full of DNA

ggsql: A Grammar of Graphics for SQL

Quantum Computers Are Not a Threat to 128-Bit Symmetric Keys

Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

Japan's cherry blossom database, 1,200 years old, has a new keeper

Brussels launched an age checking app. Hackers took 2 minutes to break it

Monero Community Crowdfunding System

MNT Reform is an open hardware laptop, designed and assembled in Germany

Modern Rendering Culling Techniques

All phones sold in the EU to have replaceable batteries from 2027

Bullshit About Bullshit Machines [pdf]

Prediction markets are breaking the news and becoming their own beat

Kefir C17/C23 Compiler

WebUSB Extension for Firefox