I'm one of the creators of HoundDog.ai (https://github.com/hounddogai/hounddog). We currently handle privacy scanning for Replit's 45M+ creators.
We built HoundDog because privacy compliance is usually a choice between manual spreadsheets or reactive runtime scanning. While runtime tools are useful for monitoring, they only catch leaks after the code is live and the data has already moved. They can also miss code paths that aren't actively triggered in production.
HoundDog traces sensitive data in code during development and helps catch risky flows (e.g., PII leaking into logs or unapproved third-party SDKs) before the code is shipped.
The core scanner is a standalone Rust binary. It doesn't use LLMs so it's local, deterministic, cheap, and fast. It can scan 1M+ lines of code in seconds on a standard laptop, and supports 80+ sensitive data types (PII, PHI, CHD) and hundreds of data sinks (logs, SDKs, APIs, ORMs etc.) out of the box.
We use AI internally to expand and scale our rules, identifying new data sources and sinks, but the execution is pure static analysis.
The scanner is free to use (no signups) so please try it out and send us feedback. I'll be around to answer any questions!
evelynaz•2h ago
aaa_2006•1h ago
When we find a match, we trace that data through the codebase across different paths and transformations, including reassignment, helper functions, and nested calls. We then identify where the data ultimately ends up, such as third party SDKs (e.g. Stripe, Datadog, OpenAI, etc.), exposures in API protocols like REST, GraphQL, or gRPC, as well as functions that write to logs or local storage. Here's a list of all supported data sinks: https://github.com/hounddogai/hounddog/blob/main/data-sinks....
Most privacy frameworks, including GDPR and US Privacy Frameworks, require these flows to be documented, so we use your source code as the source of truth to keep privacy notices accurate and aligned with what the software is actually doing.