I recently needed to debug an issue that required access to a client’s Postgres database containing sensitive data. Dumping production data wasn’t an option, and the tools I found were either non-deterministic, manual, or too intrusive.
I built pg-obfuscate to solve this specific problem.
It’s a CLI tool that:
- Connects directly to Postgres
- Obfuscates selected tables/columns based on a YAML config
- Uses deterministic rules so relationships and shapes are preserved
- Supports dry-run vs execute modes
- Is designed for safely sharing production-like datasets across environments
Example use case:
- Share a realistic dataset with contractors
- Reproduce bugs locally without leaking real data
- Sanitize a database before exporting it
It’s Postgres-only for now and intentionally narrow in scope.
The project is open source under AGPLv3+, with a commercial license available for companies that can’t use AGPL.
ofsen•1h ago
I built pg-obfuscate to solve this specific problem.
It’s a CLI tool that: - Connects directly to Postgres - Obfuscates selected tables/columns based on a YAML config - Uses deterministic rules so relationships and shapes are preserved - Supports dry-run vs execute modes - Is designed for safely sharing production-like datasets across environments
Example use case: - Share a realistic dataset with contractors - Reproduce bugs locally without leaking real data - Sanitize a database before exporting it
It’s Postgres-only for now and intentionally narrow in scope.
The project is open source under AGPLv3+, with a commercial license available for companies that can’t use AGPL.
Repo: https://github.com/Ofsen/pg-obfuscate
I’m mainly looking for feedback on: - Safety assumptions - Edge cases I might be missing - Whether this overlaps with existing tools I overlooked
Thank you