frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Postgres LISTEN/NOTIFY does not scale

https://www.recall.ai/blog/postgres-listen-notify-does-not-scale
289•davidgu•3d ago•108 comments

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels

https://github.com/fosrl/pangolin
22•miloschwartz•4h ago•2 comments

Show HN: Open source alternative to Perplexity Comet

https://www.browseros.com/
154•felarof•8h ago•52 comments

What is Realtalk’s relationship to AI? (2024)

https://dynamicland.org/2024/FAQ/#What_is_Realtalks_relationship_to_AI
231•prathyvsh•10h ago•78 comments

Graphical Linear Algebra

https://graphicallinearalgebra.net/
177•hyperbrainer•10h ago•12 comments

Batch Mode in the Gemini API: Process More for Less

https://developers.googleblog.com/en/scale-your-ai-workloads-batch-mode-gemini-api/
20•xnx•3d ago•4 comments

FOKS: Federated Open Key Service

https://foks.pub/
175•ubj•13h ago•42 comments

Turkey bans Grok over Erdoğan insults

https://www.politico.eu/article/turkey-ban-elon-musk-grok-recep-tayyip-erdogan-insult/
81•geox•2h ago•55 comments

Flix – A powerful effect-oriented programming language

https://flix.dev/
216•freilanzer•12h ago•88 comments

Measuring the impact of AI on experienced open-source developer productivity

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
513•dheerajvs•9h ago•325 comments

Red Hat Technical Writing Style Guide

https://stylepedia.net/style/
159•jumpocelot•11h ago•70 comments

Yamlfmt: An extensible command line tool or library to format YAML files

https://github.com/google/yamlfmt
24•zdw•3d ago•12 comments

Belkin ending support for older Wemo products

https://www.belkin.com/support-article/?articleNum=335419
52•apparent•7h ago•46 comments

Launch HN: Leaping (YC W25) – Self-Improving Voice AI

49•akyshnik•8h ago•25 comments

How to prove false statements: Practical attacks on Fiat-Shamir

https://www.quantamagazine.org/computer-scientists-figure-out-how-to-prove-lies-20250709/
198•nsoonhui•16h ago•153 comments

Show HN: Cactus – Ollama for Smartphones

107•HenryNdubuaku•7h ago•45 comments

eBPF: Connecting with Container Runtimes

https://h0x0er.github.io/blog/2025/06/29/ebpf-connecting-with-container-runtimes/
33•forxtrot•7h ago•0 comments

Regarding Prollyferation: Followup to "People Keep Inventing Prolly Trees"

https://www.dolthub.com/blog/2025-07-03-regarding-prollyferation/
38•ingve•3d ago•1 comments

Grok 4

https://simonwillison.net/2025/Jul/10/grok-4/
176•coloneltcb•6h ago•145 comments

Analyzing database trends through 1.8M Hacker News headlines

https://camelai.com/blog/hn-database-hype/
115•vercantez•2d ago•61 comments

Not So Fast: AI Coding Tools Can Reduce Productivity

https://secondthoughts.ai/p/ai-coding-slowdown
53•gk1•2h ago•30 comments

Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs

https://github.com/afnanenayet/diffsitter
89•mihau•13h ago•26 comments

The ChompSaw: A Benchtop Power Tool That's Safe for Kids to Use

https://www.core77.com/posts/137602/The-ChompSaw-A-Benchtop-Power-Tool-Thats-Safe-for-Kids-to-Use
80•surprisetalk•3d ago•63 comments

Matt Trout has died

https://www.shadowcat.co.uk/2025/07/09/ripples-they-cause-in-the-world/
139•todsacerdoti•18h ago•41 comments

Is Gemini 2.5 good at bounding boxes?

https://simedw.com/2025/07/10/gemini-bounding-boxes/
258•simedw•13h ago•57 comments

Foundations of Search: A Perspective from Computer Science (2012) [pdf]

https://staffwww.dcs.shef.ac.uk/people/J.Marshall/publications/SFR09_16%20Marshall%20&%20Neumann_PP.pdf
3•mooreds•3d ago•0 comments

Show HN: Typeform was too expensive so I built my own forms

https://www.ikiform.com/
166•preetsuthar17•17h ago•86 comments

Final report on Alaska Airlines Flight 1282 in-flight exit door plug separation

https://www.ntsb.gov:443/investigations/Pages/DCA24MA063.aspx
129•starkparker•5h ago•141 comments

Radiocarbon dating reveals Rapa Nui not as isolated as previously thought

https://phys.org/news/2025-06-radiocarbon-dating-reveals-rapa-nui.html
16•pseudolus•3d ago•5 comments

Optimizing a Math Expression Parser in Rust

https://rpallas.xyz/math-parser/
127•serial_dev•16h ago•55 comments
Open in hackernews

Perfect Random Floating-Point Numbers

https://specbranch.com/posts/fp-rand/
115•pclmulqdq•2mo ago

Comments

kevmo314•2mo ago
Cool observation! Despite knowing both about how floating points work and how random number generation works, this never occurred to me.

I do wish the solution were a bit simpler though. Could this be upstreamed into the language's API? https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/...

lutusp•2mo ago
> Could this be upstreamed into the language's API?

If a language is in use, and if people have written code that generates pseudorandom sequences based on an initial seed, then no -- bad idea. Some programs rely on the same pseudorandom sequence for a given seed, and may fail otherwise.

wtallis•2mo ago
That really depends on whether the language's standard library API specifies that implementations will use a particular random number generator, or merely specifies that the RNG will have certain properties. If you're going to depend on reproducible, portable RNG behavior you need to get those random numbers from an API that guarantees a particular algorithm.
FabHK•2mo ago
There's another approach for doing this: generate a random number between 1 and 2 (they all have the same exponent) and then subtract 1 (that's the default implementation in Julia [0]). But I think it also just gives you 2^53 different numbers.

So, between .5 and 1 you miss out on every second representable number, between 0.25 and .5 you miss out on 3/4 of them, and so on.

I guess for many cases that's good enough, but the article seems like a nice improvement.

ETA: Lemire has some thoughts on this [1] and links to what might be a prior solution [2]. Vigna (of xoroshiro fame) writes about it at the bottom of [3] and also links to [2]. So, presumably the implementation described in the article is faster? ("There have been some past attempts to fix these flaws, but none that avoid a huge performance penalty while doing so."

EDIT2: BTW, one of the things I love about HN (well, the world, really) is that there are people that care deeply that we can uniformly sample floats between 0 and 1 correctly, and all of them, and do it faster.

[0] see https://github.com/JuliaLang/julia/blob/master/stdlib/Random...

  rand(r::AbstractRNG, ::SamplerTrivial{CloseOpen01_64}) 
    = rand(r, CloseOpen12()) - 1.0
[1] https://lemire.me/blog/2017/02/28/how-many-floating-point-nu...

[2] https://mumble.net/~campbell/2014/04/28/uniform-random-float

https://mumble.net/~campbell/2014/04/28/random_real.c

[3] https://prng.di.unimi.it

badmintonbaseba•2mo ago
That looks kinda the same end result as the fixed point method, but possibly cheaper to execute.
thaumasiotes•2mo ago
Well, it has to be. The number can only have 53 bits of precision.

There are more than 2^53 floating point values between 0 and 1, but if you use them all, you'll fail to have a uniform distribution. Either certain regions of the space will be far more likely than other regions, or certain values will be far more likely than other values. The article takes the view that the second of those options is desirable, but it's not clear why.

And the algorithm the article describes is:

1. Generate a random float the uniform way.

2. If it's close to zero and has extra precision available, generate some more random bits to fill in the extra precision.

It's always going to be faster to skip step 2. What are we gaining by not skipping it?

Sharlin•2mo ago
Floats are typically used to approximate reals, a uniform distribution over individual float values is usually not what is needed. What is wanted is uniform distribution over equal-sized intervals.

2^53 equally-spaced values within [0, 1) is plenty enough for many use cases, but it's still fundamentally a set of fixed-point rather than floating-point numbers. For a true random floating-point value, all the available precision should be used such that every single value in the domain has some probability mass (except maybe subnormals, but you can usually forget about subnormals). Especially with 32-bit floats it's not that difficult to run out of precision when you start doing math and only have a fixed-point subset of size 2^23 available.

tialaramex•2mo ago
The floats are a surprisingly bad approximation of the reals. They're barely better than the integers, while their shortcomings are much harder to understand, so I'd say it's like if you want approximately a spaceship and you're choosing between an F-14 Tomcat and a Fiat Punto.

Neither of these things is a spaceship, but, it will be obvious to almost anybody why the Punto isn't a spaceship whereas you are less likely to know enough about the F-14 to see why it's not "good enough".

thaumasiotes•2mo ago
> The floats are a surprisingly bad approximation of the reals. They're barely better than the integers

They are integers. Each float is a pattern of 64 discrete bits. Discretion means integers.

kbolino•2mo ago
In float32 land, if

    x   = 0.5   (with bit pattern 0x3f000000)
    y   = 0.25  (with bit pattern 0x3e800000)
then

    x+y = 0.75  (with bit pattern 0x3f400000)
But if we just added the bit patterns like integers, we would get 0x7d800000 (with a float value of over 10^37). Just because they are discrete doesn't mean they are integers only that they can be mapped one-to-one with integers. The bit pattern is not the semantic meaning of the value, and you cannot perform correct operations if you ignore the semantics.
steppi•2mo ago
I think the truly surprising thing is just how well floating point numbers work in many practical applications despite how different they are from the real numbers. One could call it the "unreasonable effectiveness of floating point mathematics".
thaumasiotes•2mo ago
Why is that surprising? No one ever uses real numbers for anything.
steppi•2mo ago
There are many situations where you have something you want to compute to within low number of units in the last place, that seem fairly involved, but there are very often clever methods that let you do it without having to go to extended or arbitrary precision. Maybe not that surprising, but it's something I find interesting.
camel-cdr•2mo ago
> but if you use them all, you'll fail to have a uniform distribution

Only if you generate them all with equal probability.

thaumasiotes•2mo ago
Did you read the rest of my comment? You can't have a uniform distribution no matter what.
camel-cdr•2mo ago
Ah, sorry I didn't see the "or certain values will be far more likely than other values". But I don't understand why that wouldn't fall under the definition of a uniform distribution.

The way I would define a uniform distribution is the following:

For any two floating-point numbers, r1 and r2, which form the range [r1,r2] over the real numbers, and any second pair of floating point numbers s1 and s2, which form a range [s1,s2] over the real numbers, which is contained in [r1,r2]. The probability of getting a result in [s1,s2] when sampling from [r1,r2] must be equivalent to the result of (s2-s1)/(r2-r1) with infinite precision.

This is obviously possible to achieve.

thaumasiotes•2mo ago
Given a uniform distribution over an interval (α, β) and a tolerance ε, sample the distribution twice. What is the probability that |x_1 - x_2| < ε?
darkmighty•2mo ago
Indeed, there's a simple algorithm using interval arithmetic to do so (within bounded time too!).

Think of it as binary subdivision/search of [0,1], using a stream of random bits. Steps:

(1) Divide the current interval in 2 (using its midpoint); (2) Using one random bit, pick one of the 2 intervals; (3) if the picked interval (new current interval) lies entirely on the domain of a single floating point value[def1], stop and return this value, else go to (1).

[def1] The domain associated to a floating point value is the interval between the midpoint between it and its lower neighbor on the left, and the midpoint between it and its higher neighbor on the right.

I expect the performance is very poor, but it does cover all floating point numbers in [0,1] with exactly correct probabilities (assuming the bit probabilities are exactly correct). That's in part because naively you need higher precision arithmetic to do so, as well as up to 53 or so iterations, on average well over 32 I presume.

(I've left out the proof that probabilities are correct, but I believe it's easy to show)

corsix•2mo ago
You can start with this idea, and then make it _very_ performant by using bit counting instructions. See https://www.corsix.org/content/higher-quality-random-floats for an exposition.
mlyle•2mo ago
> It's always going to be faster to skip step 2. What are we gaining by not skipping it?

Having any number be possible is nice. Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8. Or having small vector quantities not have their angles ridiculously constrained by quantization.

Most applications don't need it-- indeed, a double has a lot of precision compared to what most people care about, and throwing away some of it at the low end of the randomness range is no big deal. But sometimes, a double is barely (or not even) enough.

thaumasiotes•2mo ago
> Having the entire space be reachable when you're sampling a solution space and the solution may be smaller than 1/8.

You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.

By contrast, if you're unconcerned with the numeric properties of the numbers you're generating, and what you want is to sample "floating point values" to examine what kind of implementation-specific behavior they might have, you would never want this scheme - you'd want to generate random floating-point values, which you can do by just filling in random bits. Those will be concentrated around zero, which doesn't matter because you're not interpreting them as numbers.

mlyle•2mo ago
> You're out of luck there; no matter what happens you can only generate floating point numbers. If you're worried that the solution lies within the space of floating point numbers, but not within the part that's reachable by generating uniform samples with 1/2^53 resolution, you need to be much, much, much more worried that the solution isn't a floating point number at all.

2^53 is 15 sigfigs. My answer might be around 10 pico-something and I might need a few digits of precision. Then I've thrown away 11 of my sigfigs and there may not be a numerically stable solution with what's left.

As you point out, double precision often isn't enough. But it's much more likely to not be enough if we throw large parts of the precision away.

FabHK•2mo ago
> certain values will be far more likely than other values

That's fine, as for the normal understanding of uniformity on the interval, you want the probability that a value is chosen to be proportional to the "space" between its left and right neighbor, and that space is not constant on the interval.

What you try to emulate is picking a real number uniformly between 0 and 1, then returning the closest floating point number (except 1, maybe).

david-gpu•2mo ago
At some point in my career I worked on floating point hardware design, and I agree with your approach.
pclmulqdq•2mo ago
That is essentially just the fixed point method mentioned here, and it is uniformly distributed across the range (and faster because you do less), but it does not fill the low bits of the number correctly (53 bits of precision != 2^53 of them between 0 and 1). If you look at the final distribution of numbers and you use under 1-10 million numbers, that method and the division method will appear to be unbiased unless you have a pathological case.
badmintonbaseba•2mo ago
Generalizing this to arbitrary intervals, not just [0, 1) looks tricky. Just scaling and shifting a perfect uniform result from [0, 1) doesn't result in a perfect random floating-point number.
TimWolla•2mo ago
Indeed. It might even result in out of range values due to implicit rounding (`3.5 + (4.5 - 3.5) * (the largest float smaller than 1)` results in 4.5, which is unexpected).

I'm a contributor to the PHP programming language and a maintainer of the randomness functionality and did some research into this topic to provide a floating point generation method that is as good as possible and came across the "Drawing Random Floating-point Numbers from an Interval" paper by Frédéric Goualard.

PHP 8.3 ships with a method generating random floats from arbitrary intervals based on the γ-section algorithm described in that paper. The PHP documentation goes into additional detail:

https://www.php.net/manual/en/random-randomizer.getfloat.php

As part of the implementation I also reached out to Prof. Goualard with some questions for clarification regarding the behavior for extreme values, which resulted in an Corrigendum being published, since the algorithm was buggy in those cases (https://frederic.goualard.net/publications/corrigendum.pdf). The entire exchange with Prof. Goualard was really pleasant.

camel-cdr•2mo ago
I've written an algorithm that generates a uniform random float in the range [a,b] that can produce every representable floating-point value with a probability proportional to the covered real number range*: https://github.com/camel-cdr/cauldron/blob/main/cauldron/ran...

* Well, almost. I messed up the probability around subnormals slightly and haven't gotten around to fixing it yet.

Still, the algorithm it self should works for all other values and it was measurably more accurate than the regular algorithm.

In terms of performance it is 10x slower compared to the regular implementation of `(randu32(x)>>8)*(1.0f/(1<<24))` on my Zen1 desktop.

lblume•2mo ago
How relevant are subnormals in this context?

i.e. what percentage of the range do they occupy and how often would one expect to need to sample to obtain one?

kbolino•2mo ago
There's as many subnormals as there are distinct values with the same exponent, so 2^23 for floats and 2^52 for doubles. This includes the number 0, but you may consider it a special case different from the other subnormals; if so, subtract one. How much of the range they occupy also depends on the interval you're requesting. However, for the common interval from 0 to 1, they occupy about 1/127th of the range for floats and 1/1023rd of the range for doubles.

One can reason to this conclusion by noting that the subnormals have the same effective exponent as the smallest normal number (except that they use the 0.xxx representation instead of 1.xxx; the exponent in question is -126 for floats and -1022 for doubles), and then observing that all normal numbers with exponents from that minimum up to and including -1 are also in the range. Since the number 1 is usually not included in the interval, we can ignore it, but even if it was, it would be the only representative from its exponent (0) anyway, so the effect on the size of the range would be miniscule.

Dylan16807•2mo ago
You will never ever ever obtain one.

Even calculating your exponent with a single count_trailing_zeros on a single 64 bit value will give you results indistinguishable from perfect.

hedora•2mo ago
I thought this was going to go further down the path that the Die Hard Battery of random number tests started:

https://www.jstatsoft.org/index.php/jss/article/view/v007i03...

(Dieharder is probably a better suite to use at this point, but that paper is approachable.)

FabHK•2mo ago
Better test suites are PractRand and TestU01.
pixelpoet•2mo ago
Some good references on this which IMO should have been mentioned in the article:

https://pharr.org/matt/blog/2022/03/05/sampling-fp-unit-inte...

https://marc-b-reynolds.github.io/distribution/2017/01/17/De...

tialaramex•2mo ago
> For unbiased random floating-point numbers, generate floating-point numbers with probabilities given by drawing a real number and then rounding to floating point.

Immediately there are alarms going off in my mind. Your machine definitely can't pick real numbers, Almost All of them are non-computable. So you definitely can't be doing that, which means you should write down what you've decided to actually do because it's not that.

What you actually want isn't the reals at all, you want a binary fraction (since all your f64s are in fact binary fractions) between 0 and 1, and so you just need to take random bits until you find a one bit (the top bit of your fraction), counting all the zeroes to decide the exponent, then you need 52 more bits for your mantissa.

I'm sure there's a faster way to get the same results, but unlike the article's imagined "drawing a real number" this is actually something we can realize, since it doesn't involve non-computable numbers.

Edited: You need 52 more bits, earlier this comment said 51 but the one bit you've already seen is implied in the floating point type, so we need 52 more, any or all of which might be zero.

yorwba•2mo ago
The part you're quoting is the spec, not the implementation. It implicitly assumes a machine model where sampling a uniformly random real number in an interval and correctly rounding it to a floating-point number are both supported operations. Of course this cannot be a real machine or even a Turing machine, but you can imagine and mathematically model a more powerful machine which can do these things. (Turing machines are imaginary anyways.)

This unimplementable algorithm for an imaginary machine is still useful, because it nails down the probability distribution we expect the output samples to have. And because they're sampled from a finite value space and the probabilities of obtaining each possible output are rational numbers, we have a chance of finding an implementable algorithm for a real machine that produces the same output distribution.

The rest of the article goes into detail on what they decided to actually do. It's similar to your proposal but has more edge-case handling and deals with different rounding modes.

lblume•2mo ago
> Turing machines are imaginary anyways.

Sure, but there is a simple correspondence between modern/typical computers (that does require a potentially unlimited supply of new hard drives, but that is rarely relevant) and Turing machines, while such a correspondence can probably never exist for the "real number" model.

badmintonbaseba•2mo ago
If you can generate a sequence of independent fair coin flips of indefinite length (this is a big if, but hold that though), then you can effectively generate a true real number uniformly from the unit interval. What you can't do is to access all of its digits all at once. But to correctly round it to a value representable in a given floating-point representation you don't need all the digits, you only need enough digits to decide whether the number falls into an interval where all reals round to the same value.

As for generating independent fair coin flips of indefinite length, none of the PRNGs qualify.

tialaramex•2mo ago
I don't think you can make the non-computable reals with your "indefinite fair coin flips". All you're making there is an approximation.

You can make all the rationals this way, but that's Almost None of the real numbers.

lutusp•2mo ago
"Perfect Random [sic] Floating-Point Numbers" ???

I had hoped that, somewhere in the article, the author would say, "In this article, the term 'random' is shorthand for 'pseudorandom'." But no.

Programming students might read the article and come away with the idea that a deterministic algorithm can generate random numbers.

This is like the sometimes-heard claim that "Our new error-detecting algorithm discovers whether a submitted program contains errors that might make it stop." Same problem -- wrong as written, but no qualifiers.

tialaramex•2mo ago
But this article converts another source of randomness, it doesn't care where the random bits came from.

It will be just as happy with dice rolling, the PRNG from Commodore 64 BASIC, the built-in random numbers from a modern Intel CPU or "random" bits harvested by sampling keyboard input, it just takes the "random" bits and makes floating point values.

So there's no reason to say it must be pseudo-random. In most practical applications it would be, but that doesn't seem essential AFAICT

binarybitlist•2mo ago
Something interesting happend, while reading...

(Sry, none native English speaker, in German:) "Moment mal, sagst Du da gerade, mit aller höchster Wahrscheinlichkeit gibt es gar keine zufälligen Zahlen, die ein Computer generieren könne, da 'Bits' immer nur entweder den Wert '0' oder eben '1' annehmen können (und das auf inzwischen mehreren Layern, andere Geschichte...)?"

let me try to explain it a little...

3 * (A = >0) adding... 1 * (A = 1) (..another topic yes...)

...but that what was wanted is "NONE" (of it...) or in other words 'maybe random'??

??

:confused:

mahemm•2mo ago
Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.
mlyle•2mo ago
The problem is, there's around 2^62 double precision numbers between 0 and 1, but they're not uniformly spaced-- there's many, many more between 0 and 0.5 (nearly 2^62) than there are between 0.5 and 1 (around 2^52), for instance.

So, if you want a uniform variate, but you want every number in the range to be possible to generate, it's tricky. Each individual small number needs to be much less likely than each individual large number.

If you just draw from the 2^62 space randomly, you almost certainly get a very small number.

spyrja•2mo ago
Seems to me that the simplest solution would be to repeatedly divide the range of numbers into two halves, then randomly selecting either one until it converges onto a single value. In C this might look something like this:

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (true) {
      halfway = low + (high - low) / 2;
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }
That should theoretically produce a uniformally-distributed value. (Although perhaps I've missed some finer point?)
seanhunter•2mo ago
So you have two doubles halfway and previous and a recursion that depends on if(halfway==previous) to terminate, where halfway is the result of a floating point calculation. You sure that’s going to work? I suspect it will frequently fail to terminate when you think.

Secondly, why does this generate a uniform random number? It’s not clear to me at all. It seems it would suffer the exact problem GP’s talking about here, that certain ranges of numbers would have a much higher probability than others on a weighted basis.

lblume•2mo ago
> Secondly, why does this generate a uniform random number?

Each interval of equal size occurs with equal likelihood at each step.

Consider that you want to generate a random number between 0 and 1024 (excl.). The midpoint would be 512, thus you choose randomly whether the lower interval [0, 512) or the higher interval [512,1024) is selected. In each step, the range size is independent of the concrete numbers, i.e. for it is exactly 2^(-step_size) * (high - low), and in each step each range has equal probability. Thus if the algorithm terminates, the selected number was in fact uniformly sampled.

I would also presume it must terminate. Assume that the two endpoints are one ulp apart. The midpoint is thus either of the two, there is no randomness involved (barring FPU flags but they don't count). In the next step, the algorithm either terminates or sets the endpoints equal, which also fixes the midpoint. Thus the procedure always returns the desired result.

spyrja•2mo ago
The issues that the GP is grappling with are largely due to the fact that they are trying to "construct" real numbers from a stream of bits. That is always going to lead to bias issues. On the other hand, with this particular algorithm (assuming a truly random source) the resulting number should be more or less completely uniform. It works because we are partitioning the search space itself in such a way that all numbers are as likely as any other. In fact, that the algorithm terminates rather predictably essentially proves just that. After one million invocations, for example, the average number of iterations was something like 57 (with the minimum being 55 and the maximum outlier 74). Which is to say you could pick any number whatsoever and expect to see it no more than once per ~2^57 invocations.
WantonQuantum•2mo ago
I was curious about this. On the one hand, comparing doubles with == is rarely a good idea but, on the other hand, your explanation seems valid.

After some testing I discovered a problem but not with the comparison. The problem is with calculating the halfway value. There are some doubles where their difference cannot be represented as a double:

  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <float.h>

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (1) {
      halfway = low + (high - low) / 2;
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }


  int main(int argc, char *argv[]) {
    srand(time(NULL));
    for (int i = 0; i < 1000000; i++) {
      double r = random_real(-DBL_MAX, DBL_MAX, rand);
      printf("%f\n", r);
    }
  }
spyrja•2mo ago
Actually, the problem is that the algorithm is having to calculate DBL_MAX+DBL_MAX, which of course is going to exceed the maximum value for a double-precision number (by definition). That isn't a very realistic use-case either, but in any case you could just clamp the inputs like so:

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    const double max = DBL_MAX / 2;
    if (high > max)
      high = max;
    const double min = -max;
    if (low < min)
      low = min;
    double halfway, previous = low;
    while (1) {
      halfway = low + (high - low) / 2;
        if (halfway == previous)
      break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }
WantonQuantum•2mo ago
Fixed it!

  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <float.h>

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (1) {
      halfway = low + (high/2 - low/2);
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }


  int main(int argc, char *argv[]) {
    srand(time(NULL));
    for (int i = 0; i < 1000000; i++) {
      double r = random_real(-DBL_MAX, DBL_MAX, rand);
      printf("%f\n", r);
    }
  }
spyrja•2mo ago
Yep, that works too. You likely won't ever need a random number that large, of course, but if you did that would be the way to do it.
camel-cdr•2mo ago
That wouldn't produce uniformly distributed random floating-point numbers.
possiblywrong•2mo ago
> Second, the least significant bits that come from this generator are biased.

I don't think this is true; I'm guessing that the author borrowed this observation from one of the various other articles linked in this thread, that address the situation where we draw a 64-bit random unsigned integer and divide by `1<<64`, instead of drawing 53 bits and dividing by `1<<53`. The former does introduce a bias, but the latter doesn't (but does still limit the fraction of representable values).

camel-cdr•2mo ago
I'm not sure which PRNG is used here, but some PRNGs have regularities in the lower bits.

This is e.g. the case for classical LCGs and the xoroshiro*+ family of PRNGs:

> If, however, one has to generate only 64-bit floating-point numbers (by extracting the upper 53 bits) xoshiro256+ is a slightly (≈15%) faster generator with analogous statistical properties. For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests (https://prng.di.unimi.it/)

> When m = 2k there is a further problem. The period of the bth bit of an LCG (where bits are numbered from the right, starting at one) is 2b , thus although the period is 2k, only the high bits are good and the lower order bits exhibit a clear repeating pattern. (https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf)

sfpotter•2mo ago
Probably relevant: https://prng.di.unimi.it/

I use the PRNGs here for my own needs and they work great... at least as far as I can tell. :-)

FabHK•2mo ago
There's the xoroshiro family, and there's the PCG family:

https://www.pcg-random.org

sfpotter•2mo ago
I haven't seen these before. Thanks for the reference.
gitroom•2mo ago
Yeah I've messed with this and always end up wishing the simple ways were actually good enough. The bit where floats aren't really evenly spaced just gets annoying. Still, I kinda get the itch to cover all cases. Respect.
FabHK•2mo ago
So, fun fact:

Between 0 and 1, you have about a quarter of all floating point numbers (and then a quarter > 1, a quarter < -1, and a quarter between -1 and 0).

Between 1 and 2, you have about 0.024% of all floating point numbers (for double precision, a factor of around 1023).

djmips•1mo ago
If the author sees this, a trivial question - why do you use em dash for minus sign in your formulas?