frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built proxy that keeps RAG working while hiding PII

3•rohansx•2h ago
Hey HN,

When you send real documents or customer data to LLMs, you face a painful tradeoff:

- Send raw text → privacy disaster - Redact with [REDACTED] → embeddings break, RAG retrieval fails, multi-turn chats become useless, and the model often refuses to answer questions about the redacted entities.

The practical solution is consistent pseudonymization: the same real entity always maps to the same token (e.g. “Tata Motors” → ORG_7 everywhere). This preserves semantic meaning for vector search and reasoning, then you rehydrate the response so the provider never sees actual names, numbers or addresses.

I got fed up fighting this with Presidio + custom glue (truncated RAG chunks, declension in Indian languages, fuzzy merging for typos/siblings, LLM confusion, percentages breaking math). So I built Cloakpipe as a tiny single-binary Rust proxy.

It does: • Multi-layer detection (regex + financial rules + optional GLiNER2 ONNX NER + custom TOML) • Consistent reversible mapping in an AES-256-GCM encrypted vault (memory zeroized) • Smart rehydration that survives truncated chunks like [[ADDRESS:A00 • Built-in fuzzy resolution for typos and similar names • Numeric reasoning mode so percentages still work for calculations

Fully open source (MIT), zero Python dependencies, <5 ms overhead.

Repo: https://github.com/rohansx/cloakpipe Demo & quick start: https://app.cloakpipe.co/demo

Would love feedback from anyone who has audited their RAG data flow or is struggling with the redaction-vs-semantics problem — especially in legal, fintech, or non-English workflows.

What approaches have you landed on?

Comments

ozgurozkan•1h ago
Cloakpipe solves a real tension cleanly — pseudonymization that preserves semantic meaning for embeddings is genuinely hard, and the AES-256-GCM encrypted vault with memory zeroing shows thoughtful security design.

One dimension worth pressure-testing: the rehydration step. The proxy receives the LLM response and substitutes real entities back in. That rehydration layer is a potential exfiltration vector if the LLM can be made to include token patterns in its response that survive the substitution. We've run adversarial tests where an AI agent was instructed (via injected context) to embed entity tokens in its output in ways that leak the mapping.

We do this kind of adversarial testing at audn.ai (https://audn.ai) — specifically data leak and PII exfiltration scenarios against RAG and agentic pipelines. Sensitive data leak and re-identification are two of the risk categories we cover explicitly.

For fintech/legal use cases especially, would be worth running a red team pass on the rehydration and vault lookup logic. Happy to connect if that'd be useful.

Analysis of 203M Trades on Kalshi

https://read.technically.dev/p/whats-a-prediction-market
1•sschnei8•45s ago•0 comments

Jeriko – an AI agent that runs directly inside your OS

https://www.jeriko.ai/
1•Khaleel7337•57s ago•1 comments

Software Proprioception – Unsung

https://unsung.aresluna.org/software-proprioception/
1•tambourine_man•1m ago•0 comments

Ask HN: Gemini Pro Plan Quota Reductions

1•earlyriser•2m ago•0 comments

Goldman banker: Clients 'glad' for 'distraction' of Iran war

https://www.telegraph.co.uk/business/2026/03/11/goldman-banker-clients-glad-for-distraction-of-ir...
1•abdelhousni•2m ago•1 comments

Punctum books is an independent open-access publisher

https://punctumbooks.com/
1•robtherobber•2m ago•0 comments

Shopify.com Is Down

https://www.shopify.com/
3•hankmander•3m ago•0 comments

Pirates of Silicon Valley

https://archive.org/details/piratesofsiliconvalley_201908
2•baal80spam•3m ago•0 comments

The Sound of AI Music

https://hackerfactor.com/blog/index.php?/archives/1090-The-Sound-of-AI-Music.html
1•speckx•5m ago•0 comments

Silicon Valley's New Obsession: Watching Bots Do Their Grunt Work

https://www.wsj.com/tech/ai/ai-bots-claude-openclaw-285ac816
2•stefap2•5m ago•0 comments

25 Years of ADSL Speed

https://brainbaking.com/post/2026/03/25-years-of-adsl-speed/
1•Brajeshwar•6m ago•0 comments

Duolingo Is Talking to ByteDance: Cracking the Pangle SDK's Encryption

https://www.buchodi.com/your-duolingo-is-talking-to-bytedance-cracking-the-pangle-sdks-encryption/
1•ibobev•7m ago•0 comments

What CI looks like at a 100-person team (PostHog)

https://www.mendral.com/blog/ci-at-scale
2•shad42•7m ago•0 comments

In Criminal Cases, Moss Is Often Underfoot and Overlooked

https://www.nytimes.com/2026/03/12/science/moss-forensics-crime.html
1•ynac•7m ago•1 comments

Show HN: CloudCLI-Web/Mobile UI for Claude Code,Codex and Gemini(8.2k stars)

https://github.com/siteboon/claudecodeui
1•simosmik•8m ago•0 comments

Log Reducer – Cut 50-90% of tokens when your AI debugs logs (MCP tool and CLI)

https://github.com/launch-it-labs/log-reducer
1•imaniman•8m ago•0 comments

Dolphin PR: Add policy on LLM contributions

https://github.com/dolphin-emu/dolphin/pull/14445
2•flykespice•9m ago•0 comments

Show HN: We built an open source tool to see how AI cites our business

https://github.com/AINYC/canonry
1•arberx•9m ago•0 comments

Show HN: Reel Rogue Update – The Invisible Feeling

https://alt-qq.com/
1•qq-niklas•10m ago•0 comments

Show HN: I made clawfeeds, feeds for agents

https://clawfeeds.com
1•petervandijck•11m ago•1 comments

New model aims to keep remote robotaxi operators alert and ready

https://techxplore.com/news/2026-03-aims-remote-robotaxi-ready.html
1•Brajeshwar•11m ago•0 comments

Dreaming of a Ten-Year Computer

https://alexwlchan.net/2026/ten-year-computer/
1•wrxd•11m ago•0 comments

Show HN: I calculated sun/shade exposure for every seat at World Cup stadiums

https://seatsun.com/
1•dkaragas•11m ago•0 comments

Teens Are Falling Out of Love with Tech

https://www.nytimes.com/2026/03/11/opinion/teens-tech-skeptics.html
4•cdrnsf•12m ago•1 comments

Version Control in the Age of AI

https://www.git-tower.com/blog/version-control-in-the-age-of-ai
1•speter•12m ago•0 comments

The Burden of Being Highy Self-Aware People

https://rakiabensassi.substack.com/p/why-highly-self-aware-people-are
2•rakiabensassi•12m ago•0 comments

The Met Releases High-Def 3D Scans of 140 Famous Art Objects

https://www.openculture.com/2026/03/the-met-releases-high-definition-3d-scans-of-140-famous-art-o...
5•coloneltcb•14m ago•0 comments

How to start learning Web Development from scratch?

1•JoyBundle•16m ago•1 comments

There Are No Fees at America's Smallest Bank (2023)

https://www.bloomberg.com/news/features/2023-04-13/america-s-smallest-bank-is-kentland-federal-sa...
1•yazantapuz•16m ago•0 comments

Clean Room as a Service

https://malus.sh/index.html
3•Venn1•17m ago•0 comments