frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Replica_db – Synthetic data generator using rust and Gaussian Copulas

https://github.com/Pragadeesh-19/replica_db
1•pragadeesh21•8h ago

Comments

pragadeesh21•8h ago
Hey HN,

I built this because i kept running into the same bottleneck on data projects: staging environments are always either empty or dangerous. Using production dumps always puts you at risk of PII leaks, but generating meaningful test data with python tools (like faker or SDV) often hit OOM errors or took hours once I tried to simulate anything complex.

I spent the last week writing replica_db to solve this. its a CLI tool written in rust that reverse engineers your existing Postgres schema and foreign key topology, then creates a "statistical genome" of your data using reservoir sampling.

The cool part (for me) was implementing Gaussian Copulas to handle correlations. Most generators treat columns independently, which creates non correlated data (like a user with age 5 earning $200k). I used nalgebra to compute the covariance matrix of numeric columns, so the engine actually learns the shape of the data.

I tested this on Uber NYC trip dataset, and it automatically detected the correlation between latitude and longitude. When i generated 5 million fake trips they respected the actual geography of NYC instead of placing points randomly in the ocean.

Benchmarks on my laptop have been encouraging. Scanning 564k real world rows takes about 2.2 seconds and generating 10 million synthetic rows takes under 5 seconds (~49k rows/sec) with constant memory usage. The output streams standard COPY format directly to stdout so you can pipe it straight into psql.

The repo isn't licensed yet. Its my first project involving this level of systems programming and statistical math in rust. So i'd appreciate any feedback on the implementation or the math strategy!

https://github.com/Pragadeesh-19/replica_db

Netflix Live Origin

https://netflixtechblog.com/netflix-live-origin-41f1b0ad5371
1•mfrw•2m ago•0 comments

Substack forces users to download app to read content

https://twitter.com/gergelyorosz/status/1999241496005066755
1•lleims•2m ago•0 comments

What is more important than working hard?

https://himanshusinghbisht.substack.com/p/what-is-more-important-than-working
1•gilfoyle_7•3m ago•0 comments

Nvidia aquires SchedMD – developer of Slurm HPC scheduling software

https://www.heise.de/en/news/Nvidia-acquires-open-source-provider-SchedMD-11115881.html
1•samuell•3m ago•0 comments

Chinese Name Generator

https://chinesenamehub.com/
1•zidana•5m ago•0 comments

A Simple Recommendation System

https://angelocortez.com/blog/recsys
2•telecomhacker•8m ago•0 comments

Show HN: Explore your own Spotify history

https://lukasschwab.me/spotify-explore/
1•lukasschwab•9m ago•0 comments

Open Source Security Patch Rewards

https://bughunters.google.com/open-source-security/patch-rewards
1•transpute•10m ago•0 comments

What Are Bent Normals?

https://discourse.threejs.org/t/get-bent-or-what-is-normal-today-anyway/88635
1•iamwil•10m ago•0 comments

Is HTML-like markup a bad idea for programmatic video generation?

https://github.com/xxatsushixx/htmlv
1•tojikomorin•11m ago•1 comments

Writing a blatant Telegram clone using Qt, QML and Rust. And C++

https://kemble.net/blog/provoke/
1•todsacerdoti•14m ago•0 comments

Show HN: A Fizzy to Telegram webhook handler

https://github.com/ronaldlangeveld/telefizz
1•ronaldl93•16m ago•0 comments

Show HN: TextGO – A text selection popup tool (alternative to PopClip/SnipDo)

https://github.com/C5H12O5/TextGO
2•C5H12O5•19m ago•0 comments

Ask HN: Did anyone learn basic arithmetic as "snapshots" instead of procedures?

1•ursAxZA•20m ago•0 comments

Building a Brainfuck DSL in Forth using code generation

https://venko.blog/articles/forth-brainfuck
2•thunderseethe•22m ago•0 comments

Electric Mining Dump Trucks

https://www.komatsu.com.au/equipment/dump-trucks/electric-mining-trucks
1•thelastgallon•29m ago•0 comments

We Lost Communication to Entertainment

https://ploum.net/2025-12-15-communication-entertainment.html
1•HotGarbage•30m ago•0 comments

BHP and Rio Tinto welcome Caterpillar battery-electric haul trucks to Pilbara

https://www.riotinto.com/en/news/releases/2025/bhp-and-rio-tinto-welcome-first-caterpillar-batter...
2•thelastgallon•31m ago•0 comments

Erdős Problem #1026

https://terrytao.wordpress.com/2025/12/08/the-story-of-erdos-problem-126/
4•tzury•38m ago•0 comments

I kept rewriting Markdown docs into Word files, so I automated it

https://yourdomain.bedpage.com/
2•Thomas-Wilson•44m ago•1 comments

Tesla board made $3B via stock awards that dwarfed tech peers

https://www.reuters.com/sustainability/boards-policy-regulation/tesla-board-made-3-billion-via-st...
3•1vuio0pswjnm7•45m ago•0 comments

How Did India Conquer Space?

https://altermag.com/articles/how-did-india-conquer-space
2•occamschainsaw•49m ago•0 comments

Oracle shares slide as earnings fail to ease AI bubble fears

https://www.bbc.com/news/articles/c9qe1e374l1o
1•1vuio0pswjnm7•54m ago•1 comments

Deep Agent Framework, the Pydantic AI Way

https://vstorm-co.github.io/pydantic-deepagents/
3•jonbaer•1h ago•1 comments

Experiments with Memory Integrity Enforcement

https://octet-stream.net/b/scb/2025-12-16-experiments-with-memory-integrity-enforcement.html
2•thombles•1h ago•0 comments

Google is bringing Android to PCs with AluminiumOS

https://www.pocket-lint.com/aluminium-os-android-pc/
3•type0•1h ago•1 comments

Scientists Have Discovered an Organism That Breaks Biology's Golden Rule

https://scitechdaily.com/scientists-have-discovered-an-organism-that-breaks-biologys-golden-rule/
3•thunderbong•1h ago•2 comments

Building Software from Blog Posts

https://build.ms/2025/12/15/building-software-from-blog-posts/
2•mergesort•1h ago•0 comments

Choosing a Web Framework for 2026

https://3d23d65ddc64ce5.substack.com/p/choosing-a-web-framework-for-2026
1•fud101•1h ago•0 comments

Oracle: Let It Fall, Let It Fall, Let It Fall

https://seekingalpha.com/article/4853440-oracle-let-it-fall-let-it-fall-let-it-fall
2•bprasanna•1h ago•0 comments