I’ve recently open-sourced model2data, a small tool that grew out of a recurring problem in my day-to-day work: needing good synthetic data for testing, demos, and development.
Random generators and faker-style approaches often produce data that looks plausible at first glance, but quickly breaks down once you care about relationships, constraints, or model semantics. model2data tries to stay closer to the data model itself, so the generated data remains structurally meaningful.
There’s no big roadmap yet — it’s something I’ve found useful myself and wanted to share.
Repo: https://github.com/JB-Analytica/model2data
I’d be very interested to hear how others approach synthetic data, or where this kind of approach might fall short.