frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•1m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•17m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•23m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•23m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
1•fkdk•26m ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
1•ukuina•29m ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•39m ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•39m ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
2•endorphine•44m ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•48m ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•49m ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
1•computer23•52m ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•52m ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•55m ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1h ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•1h ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
3•cwwc•1h ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•1h ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•1h ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•1h ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
2•mav5431•1h ago•1 comments

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

https://phys.org/news/2026-02-scientists-levitating-crystals.html
3•sizzle•1h ago•0 comments

When Michelangelo Met Titian

https://www.wsj.com/arts-culture/books/michelangelo-titian-review-the-renaissances-odd-couple-e34...
1•keiferski•1h ago•0 comments

Solving NYT Pips with DLX

https://github.com/DonoG/NYTPips4Processing
1•impossiblecode•1h ago•1 comments

Baldur's Gate to be turned into TV series – without the game's developers

https://www.bbc.com/news/articles/c24g457y534o
3•vunderba•1h ago•0 comments

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

https://www.youtube.com/watch?v=40SnEd1RWUU
2•dangtony98•1h ago•0 comments

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

https://github.com/bowang-lab/EchoJEPA
1•euvin•1h ago•0 comments

Disablling Go Telemetry

https://go.dev/doc/telemetry
2•1vuio0pswjnm7•1h ago•0 comments

Effective Nihilism

https://www.effectivenihilism.org/
1•abetusk•1h ago•1 comments

The UK government didn't want you to see this report on ecosystem collapse

https://www.theguardian.com/commentisfree/2026/jan/27/uk-government-report-ecosystem-collapse-foi...
5•pabs3•1h ago•0 comments
Open in hackernews

Replica_db – Synthetic data generator using rust and Gaussian Copulas

https://github.com/Pragadeesh-19/replica_db
2•pragadeesh21•1mo ago

Comments

pragadeesh21•1mo ago
Hey HN,

I built this because i kept running into the same bottleneck on data projects: staging environments are always either empty or dangerous. Using production dumps always puts you at risk of PII leaks, but generating meaningful test data with python tools (like faker or SDV) often hit OOM errors or took hours once I tried to simulate anything complex.

I spent the last week writing replica_db to solve this. its a CLI tool written in rust that reverse engineers your existing Postgres schema and foreign key topology, then creates a "statistical genome" of your data using reservoir sampling.

The cool part (for me) was implementing Gaussian Copulas to handle correlations. Most generators treat columns independently, which creates non correlated data (like a user with age 5 earning $200k). I used nalgebra to compute the covariance matrix of numeric columns, so the engine actually learns the shape of the data.

I tested this on Uber NYC trip dataset, and it automatically detected the correlation between latitude and longitude. When i generated 5 million fake trips they respected the actual geography of NYC instead of placing points randomly in the ocean.

Benchmarks on my laptop have been encouraging. Scanning 564k real world rows takes about 2.2 seconds and generating 10 million synthetic rows takes under 5 seconds (~49k rows/sec) with constant memory usage. The output streams standard COPY format directly to stdout so you can pipe it straight into psql.

The repo isn't licensed yet. Its my first project involving this level of systems programming and statistical math in rust. So i'd appreciate any feedback on the implementation or the math strategy!

https://github.com/Pragadeesh-19/replica_db