frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: PII-hound – A fast, dependency-free PII scanner in Go

https://github.com/saddledata/pii-hound
2•dbuckman•2h ago

Comments

dbuckman•2h ago
Hi HN,

I’ve spent a lot of time working on data pipelines, and one of the most frustrating problems is accidentally syncing PII or developer secrets (like AWS keys or SSNs) into a data warehouse or downstream system.

Most of the enterprise tools that solve this are either massive Java applications, require complex Python environments, or cost $50k/year. I just wanted a lightning-fast, single binary I could drop into a CI/CD pipeline (--fail-on-pii) or run locally against a Postgres DB to see my exposure. So, I built pii-hound.

A few technical details on how it works under the hood:

Memory Efficiency: Scanning a 50GB CSV file shouldn't cause an OOM error. It uses a concurrent, streaming architecture and implements Reservoir Sampling so it can sample huge datasets sequentially while maintaining randomness and a tiny memory footprint.

Speed: For the keyword and column-name heuristics, I implemented Aho-Corasick string matching, which is significantly faster than running dozens of individual regexes against every header.

Accuracy: To cut down on false positives, things like Credit Card numbers don't just use regex; they are piped through a Luhn algorithm validation step.

Full transparency: I originally wrote the core of this scanning engine for a larger data management platform I’m building called Saddle Data. But I realized the scanner itself is incredibly useful as a standalone utility, so I extracted it, polished the CLI, and open-sourced it under the MIT license.

It currently supports Postgres, MySQL, Snowflake, BigQuery, SQLite, S3, GCS, and local files (CSV/JSON/Parquet).

I'd love for you to point it at a local database or a messy CSV and let me know how it performs. Happy to answer any questions about the Go implementation, and PRs for new regex rules or source connectors are very welcome!

Finnoid•1h ago
Interesting! I notice you mention phone numbers but not names. Can PII-hound also detect things like first and last names in the data? I know that might not be the use case you’re primarily solving for but I’m finding as organizations use AI to process data it’s becoming more important to be able to scrub it from including any PII that might involve user or customer names. I’d love a lightweight CLI tool to do that for me.
dbuckman•1h ago
That is a good question. No, we don't do anything with names at the moment. Names are hard because they don't follow a pattern. The next version will flag columns named first_name, last_name, fullname, or customer_name. That should be published later today.

Beyond that, pii-hound supports custom rules. A user could create some rules to match known names if they wanted.

I am open to ideas of other ways to close that gap.

Finnoid•1h ago
I don’t know if this is viable but I wonder if you could package a small open source LLM and feed the data through it in chunks to scrub names. I’m sure it would add to the processing time and bunch other issues. But just a thought.

Show HN: Go-Bt: Minimalist Behavior Trees for Go

https://github.com/rvitorper/go-bt
28•rvitorper•3h ago•2 comments

Show HN: Explore the Silk Roads through an interactive map

https://www.intofarlands.com/silk-roads-map
28•intofarlands•2h ago•3 comments

Show HN: I built a navigation app that displays weather along the route

https://navimodo.com/
11•vkatluri•2d ago•6 comments

Show HN: BAREmail ʕ·ᴥ·ʔ – minimalist Gmail client for bad WiFi

https://github.com/matt-virgo/baremail
26•Virgo_matt•2h ago•21 comments

Show HN: I pipe free sports streams into Jellyfin – no ads, just HLS

https://github.com/pcruz1905/hls-restream-proxy
63•pruz•5h ago•15 comments

Show HN: TUI-use: Let AI agents control interactive terminal programs

https://github.com/onesuper/tui-use
5•dreamsome•1h ago•5 comments

Show HN: We built a camera only robot vacuum for less than 300$ (Well almost)

https://indraneelpatil.github.io/blog/2026/robot-vacuum/
87•indraneelpatil•2d ago•38 comments

Show HN: OpenMix, open-source computational framework for formulation science

https://github.com/vijayvkrishnan/openmix
2•vijayvkrishnan•29m ago•0 comments

Show HN: An interactive map of Tolkien's Middle-earth

https://middle-earth-interactive-map.web.app/
268•frasermarlow•20h ago•56 comments

Show HN: We fingerprinted 178 AI models' writing styles and similarity clusters

https://rival.tips/research/model-similarity
62•nuancedev•3h ago•18 comments

Show HN: Open-Source AI That Builds Screens, Not Just Text

https://github.com/SimonSchubert/Kai
3•arschibald•1h ago•0 comments

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

https://github.com/mattmireles/gemma-tuner-multimodal
212•MediaSquirrel•22h ago•27 comments

Show HN: PII-hound – A fast, dependency-free PII scanner in Go

https://github.com/saddledata/pii-hound
2•dbuckman•2h ago•4 comments

Show HN: Brutalist Concrete Laptop Stand (2024)

https://sam-burns.com/posts/concrete-laptop-stand/
767•sam-bee•1d ago•232 comments

Show HN: Prompt injection detector beats ProtectAI by 19% accuracy, 8.9x smaller

https://huggingface.co/hlyn/prompt-injection-judge-deberta-70m
3•Karan047•43m ago•2 comments

Show HN: Voxcode: local speech to text and ripgrep = transcript and code context

https://github.com/jensneuse/voxcode
5•jensneuse•6h ago•1 comments

Show HN: OpenFable – Open-source RAG engine using tree-structured indexes

https://github.com/alainbrown/openfable
2•alainbrown•4h ago•0 comments

Show HN: A cartographer's attempt to realistically map Tolkien's world

https://www.intofarlands.com/atlasofarda
160•intofarlands•1d ago•31 comments

Show HN: Unicode Steganography

https://steganography.patrickvuscan.com
27•PatrickVuscan•1d ago•4 comments

Show HN: Pion/handoff – Move WebRTC out of browser and into Go

https://github.com/pion/handoff
96•Sean-Der•1d ago•17 comments

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

https://github.com/matthartman/ghost-pepper
461•MattHart88•1d ago•195 comments

Show HN: Stop paying for Dropbox/Google Drive, use your own S3 bucket instead

https://locker.dev
242•Zm44•1d ago•198 comments

Show HN: Finalrun – Spec-driven testing using English and vision for mobile apps

https://github.com/final-run/finalrun-agent
26•ashish004•1d ago•12 comments

Show HN: Anos – a hand-written ~100KiB microkernel for x86-64 and RISC-V

https://github.com/roscopeco/anos
112•noone_youknow•4d ago•31 comments

Show HN: Android SSH client with full Terminal, server monitoring and runbooks

https://saltserv.com/posts/cura-sysadmin-server-monitoring-android/
3•0dayman•6h ago•0 comments

Show HN: Hippo, biologically inspired memory for AI agents

https://github.com/kitfunso/hippo-memory
124•kitfunso•1d ago•24 comments

Show HN: Tusk for macOS and Gnome

https://shapemachine.xyz/tusk/
118•factorialboy•4d ago•46 comments

Show HN: I built a tiny LLM to demystify how language models work

https://github.com/arman-bd/guppylm
897•armanified•2d ago•134 comments

Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

https://github.com/fikrikarim/parlor
288•karimf•2d ago•36 comments

Show HN: GovAuctions lets you browse government auctions at once

https://www.govauctions.app/
314•player_piano•2d ago•89 comments