frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I built proxy that keeps RAG working while hiding PII

3•rohansx•3h ago
Hey HN,

When you send real documents or customer data to LLMs, you face a painful tradeoff:

- Send raw text → privacy disaster - Redact with [REDACTED] → embeddings break, RAG retrieval fails, multi-turn chats become useless, and the model often refuses to answer questions about the redacted entities.

The practical solution is consistent pseudonymization: the same real entity always maps to the same token (e.g. “Tata Motors” → ORG_7 everywhere). This preserves semantic meaning for vector search and reasoning, then you rehydrate the response so the provider never sees actual names, numbers or addresses.

I got fed up fighting this with Presidio + custom glue (truncated RAG chunks, declension in Indian languages, fuzzy merging for typos/siblings, LLM confusion, percentages breaking math). So I built Cloakpipe as a tiny single-binary Rust proxy.

It does: • Multi-layer detection (regex + financial rules + optional GLiNER2 ONNX NER + custom TOML) • Consistent reversible mapping in an AES-256-GCM encrypted vault (memory zeroized) • Smart rehydration that survives truncated chunks like [[ADDRESS:A00 • Built-in fuzzy resolution for typos and similar names • Numeric reasoning mode so percentages still work for calculations

Fully open source (MIT), zero Python dependencies, <5 ms overhead.

Repo: https://github.com/rohansx/cloakpipe Demo & quick start: https://app.cloakpipe.co/demo

Would love feedback from anyone who has audited their RAG data flow or is struggling with the redaction-vs-semantics problem — especially in legal, fintech, or non-English workflows.

What approaches have you landed on?

Comments

ozgurozkan•2h ago
Cloakpipe solves a real tension cleanly — pseudonymization that preserves semantic meaning for embeddings is genuinely hard, and the AES-256-GCM encrypted vault with memory zeroing shows thoughtful security design.

One dimension worth pressure-testing: the rehydration step. The proxy receives the LLM response and substitutes real entities back in. That rehydration layer is a potential exfiltration vector if the LLM can be made to include token patterns in its response that survive the substitution. We've run adversarial tests where an AI agent was instructed (via injected context) to embed entity tokens in its output in ways that leak the mapping.

We do this kind of adversarial testing at audn.ai (https://audn.ai) — specifically data leak and PII exfiltration scenarios against RAG and agentic pipelines. Sensitive data leak and re-identification are two of the risk categories we cover explicitly.

For fintech/legal use cases especially, would be worth running a red team pass on the rehydration and vault lookup logic. Happy to connect if that'd be useful.

Show HN: OneCLI – Vault for AI Agents in Rust

https://github.com/onecli/onecli
9•guyb3•28m ago•4 comments

Show HN: PipeStep – Step-through debugger for GitHub Actions workflows

https://github.com/Photobombastic/pipestep
3•photobombastic•1m ago•0 comments

SHOW HN: A usage circuit breaker for Cloudflare Workers

24•ethan_zhao•2d ago•8 comments

Show HN: A2Apex – Test, certify, and discover trusted A2A agents

https://a2apex.io
2•Hauk307•59m ago•0 comments

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

https://github.com/obsessiondb/rudel
93•keks0r•3h ago•57 comments

Show HN: Axe A 12MB binary that replaces your AI framework

https://github.com/jrswab/axe
61•jrswab•3h ago•55 comments

Show HN: s@: decentralized social networking over static sites

http://satproto.org/
390•remywang•16h ago•190 comments

Show HN: Riventa.Dev – AI-native DevOps that acts, not just alerts

https://www.riventa.dev/
2•christopherAs•1h ago•0 comments

Show HN: VaultLeap – USD accounts for founders outside the US

https://vaultleap.com
2•GregReve•2h ago•1 comments

Show HN: We open sourced Vapi – UI included

https://github.com/dograh-hq/dograh
6•pritesh1908•2h ago•4 comments

Show HN: A desktop app for managing Claude Code sessions

https://github.com/doctly/switchboard
2•kapitalx•2h ago•1 comments

Show HN: Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI

https://github.com/yuuichieguchi/Calyx
24•yuu1ch13•3h ago•27 comments

Show HN: Python DSL for system programming with manual memory and linear types

https://github.com/1flei/PythoC/
2•1flei•3h ago•0 comments

Show HN: Open-source browser for AI agents

https://github.com/theredsix/agent-browser-protocol
137•theredsix•1d ago•47 comments

Show HN: I built proxy that keeps RAG working while hiding PII

3•rohansx•3h ago•1 comments

Show HN: We wrote a custom microkernel for XR because Android felt too bloated

https://explorexenevaos.vercel.app/
2•ayush_xeneva•3h ago•2 comments

Show HN: Autoresearch@home

https://www.ensue-network.ai/autoresearch
72•austinbaggio•17h ago•15 comments

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

https://github.com/JitseLambrichts/MultiMind-AI
4•JitseLambrichts•3h ago•2 comments

Show HN: I built a tool that watches webpages and exposes changes as RSS

https://sitespy.app
298•vkuprin•1d ago•75 comments

Show HN: SmartClip – fix multi-line shell commands before they hit your terminal

https://github.com/akshaydeshraj/smartclip
2•akshaydeshraj•3h ago•0 comments

Show HN: Imgfprint – deterministic image fingerprinting library for Rust

2•bravo1goingdark•3h ago•0 comments

Show HN: A context-aware permission guard for Claude Code

https://github.com/manuelschipper/nah/
119•schipperai•17h ago•82 comments

Show HN: XLA-based array computing framework for R

https://github.com/r-xla/anvil
10•sebffischer•3d ago•1 comments

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

https://fuelingcuriosity.com/game.html
114•fuelingcurious•1d ago•46 comments

Show HN: Lazyagent – One terminal UI for all your coding agents

https://lazyagent.dev/
3•nahime•4h ago•3 comments

Show HN: AgentBridge – Let AI agents control Classic Mac OS thru a shared folder

https://github.com/SeanFDZ/agentbridge
2•hammer32•4h ago•1 comments

Show HN: Satellite imagery object detection using text prompts

https://www.useful-ai-tools.com/tools/satellite-analysis-demo/
51•eyasu6464•3d ago•19 comments

Show HN: Klaus – OpenClaw on a VM, batteries included

https://klausai.com/
154•robthompson2018•1d ago•90 comments

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

https://aether.saphal.me/dashboard/default
62•saphalpdyl•1d ago•17 comments

Show HN: Elevators.ltd

https://elevators.ltd
4•pkstn•5h ago•3 comments