frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: PII-hound – A fast, dependency-free PII scanner in Go

https://github.com/saddledata/pii-hound
2•dbuckman•2h ago

Comments

dbuckman•2h ago
Hi HN,

I’ve spent a lot of time working on data pipelines, and one of the most frustrating problems is accidentally syncing PII or developer secrets (like AWS keys or SSNs) into a data warehouse or downstream system.

Most of the enterprise tools that solve this are either massive Java applications, require complex Python environments, or cost $50k/year. I just wanted a lightning-fast, single binary I could drop into a CI/CD pipeline (--fail-on-pii) or run locally against a Postgres DB to see my exposure. So, I built pii-hound.

A few technical details on how it works under the hood:

Memory Efficiency: Scanning a 50GB CSV file shouldn't cause an OOM error. It uses a concurrent, streaming architecture and implements Reservoir Sampling so it can sample huge datasets sequentially while maintaining randomness and a tiny memory footprint.

Speed: For the keyword and column-name heuristics, I implemented Aho-Corasick string matching, which is significantly faster than running dozens of individual regexes against every header.

Accuracy: To cut down on false positives, things like Credit Card numbers don't just use regex; they are piped through a Luhn algorithm validation step.

Full transparency: I originally wrote the core of this scanning engine for a larger data management platform I’m building called Saddle Data. But I realized the scanner itself is incredibly useful as a standalone utility, so I extracted it, polished the CLI, and open-sourced it under the MIT license.

It currently supports Postgres, MySQL, Snowflake, BigQuery, SQLite, S3, GCS, and local files (CSV/JSON/Parquet).

I'd love for you to point it at a local database or a messy CSV and let me know how it performs. Happy to answer any questions about the Go implementation, and PRs for new regex rules or source connectors are very welcome!

Finnoid•1h ago
Interesting! I notice you mention phone numbers but not names. Can PII-hound also detect things like first and last names in the data? I know that might not be the use case you’re primarily solving for but I’m finding as organizations use AI to process data it’s becoming more important to be able to scrub it from including any PII that might involve user or customer names. I’d love a lightweight CLI tool to do that for me.
dbuckman•57m ago
That is a good question. No, we don't do anything with names at the moment. Names are hard because they don't follow a pattern. The next version will flag columns named first_name, last_name, fullname, or customer_name. That should be published later today.

Beyond that, pii-hound supports custom rules. A user could create some rules to match known names if they wanted.

I am open to ideas of other ways to close that gap.

Finnoid•40m ago
I don’t know if this is viable but I wonder if you could package a small open source LLM and feed the data through it in chunks to scrub names. I’m sure it would add to the processing time and bunch other issues. But just a thought.

AI-Driven Demand for Gas Turbines Risks a New Energy Crunch

https://www.bloomberg.com/features/2025-bottlenecks-gas-turbines/
1•sethbannon•22s ago•0 comments

Built a tool that simulates company-specific interviewers

https://portlumeai.com
1•portlumeai•45s ago•0 comments

A Learning a Day: Daily Posts Since May 2008

https://alearningaday.blog/archives/
1•Olshansky•51s ago•0 comments

Show HN: OpenMix, open-source computational framework for formulation science

https://github.com/vijayvkrishnan/openmix
1•vijayvkrishnan•1m ago•0 comments

With Cox V. Sony The Supreme Court Provides Another Internet-Protecting Decision

https://www.techdirt.com/2026/04/07/with-cox-v-sony-the-supreme-court-provides-yet-another-intern...
2•hn_acker•2m ago•1 comments

What Is Ghost Murmur? Secretive CIA Tool Linked to Iran Airman Rescue

https://www.newsweek.com/ghost-murmur-secretive-cia-tool-iran-airman-rescue-11797688
2•petethomas•2m ago•0 comments

Upgrading MacBook Neo to 1 TB using iPhone parts [video]

https://www.youtube.com/watch?v=bIeEGeTd5DE
1•burnt-resistor•2m ago•0 comments

Why LLMs Can't Play Chess

https://www.nicowesterdale.com/blog/why-llms-cant-play-chess
1•osrec•2m ago•0 comments

Understanding the Kalman Filter with a Simple Radar Example

https://kalmanfilter.net
2•alex_be•2m ago•0 comments

Tesla can play music from a floppy drive

https://twitter.com/olegkutkov/status/2041925827416277460
1•stefan_•2m ago•0 comments

RenderDraw Lens – Give AI coding tools visual context from the browser

https://renderdraw.com/tools/lens
1•eshivers•3m ago•0 comments

Do DMCA Takedown Notices Need to Expressly Refer to the Lack of Fair Use?

https://blog.ericgoldman.org/archives/2026/03/do-dmca-takedown-notices-need-to-expressly-refer-to...
2•hn_acker•4m ago•1 comments

Show HN: Canvora – describe what you want, get a branded visual in any language

https://canvora.ai
1•vivekalogics•5m ago•0 comments

Greece to ban social media for under-15s from next year

https://www.bbc.com/news/articles/ckgx1x742x5o
3•Brajeshwar•7m ago•0 comments

The reason your Fort Lauderdale competitor is ranking above you

https://fortauderdaleseo.substack.com/p/the-reason-your-fort-lauderdale-competitor
1•auditnews•7m ago•1 comments

How Pakistan managed to get the US and Iran to a ceasefire

https://www.aljazeera.com/features/2026/4/8/how-pakistan-managed-to-get-the-us-and-iran-to-a-ceas...
1•rkp8000•7m ago•0 comments

Scaling Managed Agents: Decoupling the brain from the hands

https://www.anthropic.com/engineering/managed-agents
1•meetpateltech•8m ago•0 comments

Claude Managed Agents

https://claude.com/blog/claude-managed-agents
4•adocomplete•9m ago•1 comments

The Download: water threats in Iran and AI's impact on what entrepreneurs make

https://www.technologyreview.com/2026/04/08/1135405/the-download-water-threats-iran-ais-impact-on...
1•joozio•9m ago•0 comments

Rust for CPython Progress Update April 2026

https://blog.python.org/2026/04/rust-for-cpython-2026-04/
1•rented_mule•10m ago•0 comments

Mustafa Suleyman: AI development won't hit a wall anytime soon–here's why

https://www.technologyreview.com/2026/04/08/1135398/mustafa-suleyman-ai-future/
2•joozio•11m ago•0 comments

Coding Agents for Old People

https://blog.tasuki.org/coding-agents/
1•speckx•11m ago•0 comments

A friend got fired from Coinbase for his side project he worked on for 5 years

https://nexustrade.io/blog/i-was-fired-from-coinbase-for-building-an-ai-trading-platform-20260408
5•eranation•11m ago•2 comments

Worldwide Semiconductor Revenue to Exceed $1.3T in 2026

https://www.gartner.com/en/newsroom/press-releases/2026-04-08-gartner-forecasts-worldwide-semicon...
1•layer8•12m ago•1 comments

We built a VS Code extension for reproducible SQL workflows

https://marketplace.visualstudio.com/items?itemName=Exasol.exasol-vscode
1•one-random-geek•13m ago•1 comments

Show HN: Prompt injection detector beats ProtectAI by 19% accuracy, 8.9x smaller

https://huggingface.co/hlyn/prompt-injection-judge-deberta-70m
1•Karan047•15m ago•0 comments

The Downfall and Enshittification of Microsoft in 2026

https://caio.ca/blog/the-downfall-and-enshittification-of-microsoft.html
3•birdculture•16m ago•1 comments

Nuclear Energy Heresy with Daniel Chen [video]

https://www.youtube.com/watch?v=HZq_V-UKLj0
1•leonidasrup•17m ago•1 comments

Show HN: Access OpenClaw's workspace files from anywhere and any device

https://github.com/RageDotNet/openclaw-webdav
1•gregatragenet3•17m ago•0 comments

Show HN: MCP-fence – MCP firewall I built and tried to break (6 audit rounds)

https://www.npmjs.com/package/mcp-fence
1•yjcho9317•19m ago•0 comments