Show HN: I built proxy that keeps RAG working while hiding PII

3•rohansx•3h ago

Hey HN,

When you send real documents or customer data to LLMs, you face a painful tradeoff:

- Send raw text → privacy disaster - Redact with [REDACTED] → embeddings break, RAG retrieval fails, multi-turn chats become useless, and the model often refuses to answer questions about the redacted entities.

The practical solution is consistent pseudonymization: the same real entity always maps to the same token (e.g. “Tata Motors” → ORG_7 everywhere). This preserves semantic meaning for vector search and reasoning, then you rehydrate the response so the provider never sees actual names, numbers or addresses.

I got fed up fighting this with Presidio + custom glue (truncated RAG chunks, declension in Indian languages, fuzzy merging for typos/siblings, LLM confusion, percentages breaking math). So I built Cloakpipe as a tiny single-binary Rust proxy.

It does: • Multi-layer detection (regex + financial rules + optional GLiNER2 ONNX NER + custom TOML) • Consistent reversible mapping in an AES-256-GCM encrypted vault (memory zeroized) • Smart rehydration that survives truncated chunks like [[ADDRESS:A00 • Built-in fuzzy resolution for typos and similar names • Numeric reasoning mode so percentages still work for calculations

Fully open source (MIT), zero Python dependencies, <5 ms overhead.

Repo: https://github.com/rohansx/cloakpipe Demo & quick start: https://app.cloakpipe.co/demo

Would love feedback from anyone who has audited their RAG data flow or is struggling with the redaction-vs-semantics problem — especially in legal, fintech, or non-English workflows.

What approaches have you landed on?

Comments

ozgurozkan•2h ago

Cloakpipe solves a real tension cleanly — pseudonymization that preserves semantic meaning for embeddings is genuinely hard, and the AES-256-GCM encrypted vault with memory zeroing shows thoughtful security design.

One dimension worth pressure-testing: the rehydration step. The proxy receives the LLM response and substitutes real entities back in. That rehydration layer is a potential exfiltration vector if the LLM can be made to include token patterns in its response that survive the substitution. We've run adversarial tests where an AI agent was instructed (via injected context) to embed entity tokens in its output in ways that leak the mapping.

We do this kind of adversarial testing at audn.ai (https://audn.ai) — specifically data leak and PII exfiltration scenarios against RAG and agentic pipelines. Sensitive data leak and re-identification are two of the risk categories we cover explicitly.

For fintech/legal use cases especially, would be worth running a red team pass on the rehydration and vault lookup logic. Happy to connect if that'd be useful.

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: PipeStep – Step-through debugger for GitHub Actions workflows

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: A2Apex – Test, certify, and discover trusted A2A agents

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

Show HN: Axe A 12MB binary that replaces your AI framework

Show HN: s@: decentralized social networking over static sites

Show HN: Riventa.Dev – AI-native DevOps that acts, not just alerts

Show HN: VaultLeap – USD accounts for founders outside the US

Show HN: We open sourced Vapi – UI included

Show HN: A desktop app for managing Claude Code sessions

Show HN: Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI

Show HN: Python DSL for system programming with manual memory and linear types

Show HN: Open-source browser for AI agents

Show HN: I built proxy that keeps RAG working while hiding PII

Show HN: We wrote a custom microkernel for XR because Android felt too bloated

Show HN: Autoresearch@home

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: SmartClip – fix multi-line shell commands before they hit your terminal

Show HN: Imgfprint – deterministic image fingerprinting library for Rust

Show HN: A context-aware permission guard for Claude Code

Show HN: XLA-based array computing framework for R

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: Lazyagent – One terminal UI for all your coding agents

Show HN: AgentBridge – Let AI agents control Classic Mac OS thru a shared folder

Show HN: Satellite imagery object detection using text prompts

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: Elevators.ltd

Show HN: I built proxy that keeps RAG working while hiding PII

Comments

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: PipeStep – Step-through debugger for GitHub Actions workflows

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: A2Apex – Test, certify, and discover trusted A2A agents

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

Show HN: Axe A 12MB binary that replaces your AI framework

Show HN: s@: decentralized social networking over static sites

Show HN: Riventa.Dev – AI-native DevOps that acts, not just alerts

Show HN: VaultLeap – USD accounts for founders outside the US

Show HN: We open sourced Vapi – UI included

Show HN: A desktop app for managing Claude Code sessions

Show HN: Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI

Show HN: Python DSL for system programming with manual memory and linear types

Show HN: Open-source browser for AI agents

Show HN: I built proxy that keeps RAG working while hiding PII

Show HN: We wrote a custom microkernel for XR because Android felt too bloated

Show HN: Autoresearch@home

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: SmartClip – fix multi-line shell commands before they hit your terminal

Show HN: Imgfprint – deterministic image fingerprinting library for Rust

Show HN: A context-aware permission guard for Claude Code

Show HN: XLA-based array computing framework for R

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: Lazyagent – One terminal UI for all your coding agents

Show HN: AgentBridge – Let AI agents control Classic Mac OS thru a shared folder

Show HN: Satellite imagery object detection using text prompts

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: Elevators.ltd