frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

When good pseudorandom numbers go bad

https://blog.djnavarro.net/posts/2025-05-18_multivariate-normal-sampling-floating-point/
45•chewxy•3d ago

Comments

AlotOfReading•4h ago
Most people don't really care about numerical stability or correctness. What they usually want is reproducibility, but they go down a rabbit hole with those other topics as a way to get it, at least in part because everyone thinks reproducibility is too slow.

It was 20 years ago, but that's not the case today. The vast majority of hardware today implements 754 reproducibly if you're willing to stick to a few basic principles:

1. same inputs

2. same operations, in the same order

3. no "special functions", denormals, or NaNs.

If you accept these restrictions (and realistically you weren't handling NaNs or denormals properly anyway), you can get practical reproducibility on modern hardware for minimal (or no) performance cost if your toolchain cooperates. Sadly, toolchains don't prioritize this because it's easy to get wrong across the scope of a modern language and users don't know that it's possible.

craigacp•1h ago
The same operations in the same order is a tough constraint in an environment where core count is increasing and clock speeds/IPC are not. It's hard to rewrite some of these algorithms to use a parallel decomposition that's the same as the serial one.

I've done a lot of work on reproducibility in machine learning systems, and its really, really hard. Even the JVM got me by changing some functions in `java.lang.Math` between versions & platforms (while keeping to their documented 2ulp error bounds).

dzaima•51m ago
Even denormals and NaNs should be perfectly consistent, at least on CPUs. (as long as you're not inspecting the bit patterns of the NaNs, at least)

Irrational stdlib functions (trig/log/exp; not sqrt though for whatever reason) should really be basically the only source of non-reproducibility in typical programs (assuming they don't explicitly do different things depending on non-controlled properties; and don't use libraries doing that either, which is also a non-trivial ask; and that there's no overly-aggressive optimizer that does incorrect transformations).

I'd hope that languages/libraries providing seeded random sources with a guarantee of equal behavior across systems would explicitly note which operations aren't reproducible though, otherwise seeding is rather pointless; no clue if R does that.

thaumasiotes•26m ago
I don't actually understand why you'd want reproducibility in a statistical simulation. If you fix the output, what are you learning? The point of the simulation is to produce different random numbers so you can see what the outcomes are like... right?

Let's say I write a paper that says "in this statistical model, with random seed 1495268404, I get Important Result Y", and you criticize me on the grounds that when you run the model with random seed 282086400, Important Result Y does not hold. Doesn't this entire argument fail to be conceptually coherent?

mjcohen•3h ago
I found this an enjoyable read. I also have Wilkinson, both text and Algol book, which I used many years ago to write a fortran eigenvalue/vector routine. Worked very nicely. Done in VAX fortran and showed me that having subscript checking on added 30% to the run time.
coolcase•1h ago
I don't grok this but if you had to describe it in a nutshell, is this because of a race condition? Differences in HW? Floating point ops have some randomness built in?
mattb314•51m ago
Super rough summary of the first half: in order to pick out random vectors with a given shape (where the "shape" is determined by the covariance matrix), MASS::mvrnorm() computes some eigenvectors, and eigenvectors are only well defined up to a sign flip. This means tiny floating differences between machines can result in one machine choosing v_1, v_2, v_3,... as eigenvectors, while another machine chooses -v_1, v_3, -v_3,... The result for sampling random numbers is totally different with the sign flips (but still "correct" because we only care about the overall distribution--these are random numbers after all). The section around "Q1 / Q2" is the core of the article.

There's a lot of other stuff here too: mvtnorm::rmvnorm() also can use eigendecomp to generate your numbers, but it does some other stuff to eliminate the effect of the sign flips so you don't see this reproducibility issue. mvtnorm::rmvnorm also supports a second method (Cholesky decomp) that is uniquely defined and avoids eigenvectors entirely, so it's more stable. And there's some stuff on condition numbers not really mattering for this problem--turns out you can't describe all possible floating point problems a matrix could have with a single number.

Silly job interview questions in Haskell

https://chrispenner.ca/posts/interview
19•behnamoh•1h ago•5 comments

Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

https://github.com/kepano/defuddle
157•kepano•6h ago•36 comments

The Future of Flatpak

https://lwn.net/Articles/1020571/
153•dxs•4h ago•66 comments

Claude 4

https://www.anthropic.com/news/claude-4
1543•meetpateltech•11h ago•872 comments

32 bits that changed microprocessor design

https://spectrum.ieee.org/bellmac-32-ieee-milestone
52•mdp2021•4h ago•5 comments

That fractal that's been up on my wall for years

https://chriskw.xyz/2025/05/21/Fractal/
335•chriskw•12h ago•21 comments

Airport for DuckDB

https://airport.query.farm/
61•jonbaer•3d ago•10 comments

Does Earth have two high-tide bulges on opposite sides? (2014)

http://physics.stackexchange.com/questions/121830/does-earth-really-have-two-high-tide-bulges-on-opposite-sides
143•imurray•8h ago•47 comments

“Secret Mall Apartment,” a Protest for Place

https://modernagejournal.com/secret-mall-apartment-a-protest-for-place/251023/
68•rufus_foreman•5h ago•37 comments

Mozilla to shut down Pocket and Fakespot

https://support.mozilla.org/en-US/kb/future-of-pocket
829•phantomathkg•11h ago•526 comments

CRDTs #2: Turtles All the Way Down

https://jhellerstein.github.io/blog/crdt-turtles/
13•pfarago•1h ago•0 comments

How to cheat at settlers by loading the dice (2017)

https://izbicki.me/blog/how-to-cheat-at-settlers-of-catan-by-loading-the-dice-and-prove-it-with-p-values.html
90•jxmorris12•9h ago•75 comments

Improving performance of rav1d video decoder

https://ohadravid.github.io/posts/2025-05-rav1d-faster/
257•todsacerdoti•15h ago•88 comments

Richard Garwin’s role in designing the hydrogen bomb was obscured

https://www.nytimes.com/2025/05/19/science/richard-garwin-hydrogen-bomb.html
39•LAsteNERD•3d ago•9 comments

Loading Pydantic models from JSON without running out of memory

https://pythonspeed.com/articles/pydantic-json-memory/
84•itamarst•9h ago•29 comments

Sketchy Calendar

https://www.inkandswitch.com/ink/notes/sketchy-calendar/
33•surprisetalk•4h ago•4 comments

Fast Allocations in Ruby 3.5

https://railsatscale.com/2025-05-21-fast-allocations-in-ruby-3-5/
193•tekknolagi•13h ago•42 comments

Ancient law requires a bale of straw to hang from Charing Cross rail bridge

https://www.ianvisits.co.uk/articles/ancient-law-requires-a-bale-of-hay-to-hang-from-charing-cross-rail-bridge-81318/
50•alexbilbie•19h ago•47 comments

I Built My Own Audio Player

https://nexo.sh/posts/why-i-built-a-native-mp3-player-in-swiftui/
187•nexo-v1•13h ago•96 comments

A South Korean grand master on the art of the perfect soy sauce

https://www.theguardian.com/world/2025/may/21/without-time-there-is-no-flavour-a-south-korean-grand-master-on-the-art-of-the-perfect-soy-sauce
137•n1b0m•1d ago•103 comments

We’ll be ending web hosting for your apps on Glitch

https://blog.glitch.com/post/changes-are-coming-to-glitch/
74•js4ever•10h ago•43 comments

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

59•digitaltzar•12h ago•51 comments

1,145 pull requests per day

https://saile.it/1145-pull-requests-per-day/
31•sailE•8h ago•21 comments

W.a.s.t.e. Not: John Scanlan looks for the future in the dustbins of history

https://thebaffler.com/latest/w-a-s-t-e-not-adams
3•Thevet•3d ago•0 comments

Management = Bullshit (LLM Edition)

http://funcall.blogspot.com/2025/05/management-bullshit.html
17•dxs•4h ago•12 comments

When a team is too big

https://blog.alexewerlof.com/p/when-a-team-is-too-big
52•gpi•3d ago•54 comments

When good pseudorandom numbers go bad

https://blog.djnavarro.net/posts/2025-05-18_multivariate-normal-sampling-floating-point/
45•chewxy•3d ago•7 comments

Trade Secrecy in Willy Wonka's Chocolate Factory (2009)

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1430463
34•NaOH•7h ago•8 comments

Show HN: SQLite JavaScript - extend your database with JavaScript

https://github.com/sqliteai/sqlite-js
145•marcobambini•14h ago•44 comments

Tab Roving – focus management for element groups

https://nik.digital/posts/tab-roving
4•samwho•3d ago•0 comments