Myths About Floating-Point Numbers (2021)

https://www.asawicki.info/news_1741_myths_about_floating-point_numbers

66•Bogdanp•6mo ago

Comments

BugsJustFindMe•5mo ago

I've never heard anyone say any of these supposed myths, except for the first one, sort of, but nobody means what the first one pretends it means, so this whole post feels like a big strawman to me.

rollcat•5mo ago

People do say a lot of nonsense about floats, my younger self trying to bash JavaScript included. They e.g. do fine as integers - up to 2**53; this could be optimised by a JIT to use actual integer math.

Cieric•5mo ago

Just to add some context Adam works at AMD as a dev tech, so he is constantly working with game studios developers directly. While I can't say I've heard the same things since I'm somewhere else, I have seen some of the assumptions made in some shader code and they do line up with the kind of things he's saying.

jbjbjbjb•5mo ago

I was putting together some technical interview questions for our candidates and I wanted to see what ChatGPT would put for an answer. It told me floating point numbers were non-deterministic and wouldn’t back down from that answer so it’s getting it from somewhere.

AshamedCaptain•5mo ago

I concur, it seems low effort, and the only real common "myth" (the 1st one) is not really disproven. Infact the very example he puts goes to prove it, as it is going to become an infinite loop given large enough N....

Also compiler optimizations should not affect the result. Specially without "fast-math/O3" (which arguably many people stupidly use nowadays, then complain).

zokier•5mo ago

> Also compiler optimizations should not affect the result. Specially without "fast-math/O3" (which arguably many people stupidly use nowadays, then complain).

-ffp-contract is annoying exception to that principle.

jandrese•5mo ago

The first rule is true, but relying on it is dangerous unless you are well versed in what floats can and can not be represented exactly. It's best to pretend it isn't true in most cases to avoid footguns.

janalsncm•5mo ago

Same, but I still learned a fair amount anyways. I say no harm, no foul.

nspattak•5mo ago

i am laughing and crying at the same time...

a couple of decades ago, I was still a lowly applied mathematics (and aspiring programmer) student when i proposed to a professor to change the compiler options from O2 to O3 to gain performance (reducing run time by a significant percentage like 50% or more...) only to be disregarded by saying "no, compiler optimizations break numerical accuracy".

rollcat•5mo ago

Related: What Every Computer Scientist Should Know About Floating-Point Arithmetic <https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...>

hans_castorp•5mo ago

Also related:

https://floating-point-gui.de/

zahlman•5mo ago

My list includes: https://0.30000000000000004.com/

bee_rider•5mo ago

I like these. They are push back against the sort of… first correction that people make when encountering floating point weirdness. That is, the first mistake we make is to treat floats as reals, and then we observe some odd rounding behavior. The second mistake we make is to treat the rounding events as random. A nice thing about IEEE floats is that the rounding behavior is well defined.

Often it doesn't matter, like you ask for a gemm and you get whatever order of operations blas, AVX-whatever, and OpenMP conspire to give you, so it is more-or-less random.

But if it does matter, the ability to define it is there.

jandrese•5mo ago

> A nice thing about IEEE floats is that the rounding behavior is well defined.

Until it isn't. I used to play the CodeWeavers port of Kohan and while the game would allow you to do crossplay between Windows and Linux, differences in how the two OSes rounded floats would cause the game to desynchronize after 15-20 minutes of play or so. Some unit's pathfinding algorithm would zig on Windows and zag on Linux, causing the state to diverge and end up kicking off one of the players.

bee_rider•5mo ago

Is it possible that your different operating systems just had different mxcsr values?

Or, since it was a port, maybe they were compiled with different optimizations.

There are a lot of things happening under the hood but most of them should be deterministic.

toolslive•5mo ago

until someone compiles with --ffast-math enabled, stating "I don't care about accuracy, as long as it's fast".

bee_rider•5mo ago

It is good to enable that flag because it also enables the “fun safe math optimizations” flag, and it is important to remind people that math is a safe way to have fun.

toolslive•5mo ago

"Friends don't let friends use fast-math"

https://simonbyrne.github.io/notes/fastmath/

jcranmer•5mo ago

The differences are almost certainly not in how the two OSes rounded floats--the IEEE rounding modes are standard, and almost no one actually bothers to even change the rounding mode from the default.

For cross-OS issues, the most likely culprit is that Windows and Linux are using different libm implementations, which means that the results of functions like sin or atan2 are going to be slightly different.

zokier•5mo ago

Problem with rounding modes and other fpenv flags is that any library anywhere might flip some flag and suddenly the whole program changes behavior.

gpderetta•5mo ago

The libm difference might explain it, but one possible difference is that long double is 80 bit on x86 linux and 64 bit on x86 windows.

My recollection is fuzzy, but IIRC the legacy x87 control word is set always to extended precision on linux, while it is on double precision on windows, and this affect normal float and double computations as well: the conversion to float and double is only done when storing to and from memory while intermediate in-register operations are always at the at the maximum enabled precision. Changing precision before each operation is expensive, so it is not done.

This is one of the cause of x87 apparent nondeterminism as it depends on the compiler unpredictably spilling fp registers [1]: unless you always use the maximum enabled precision, computations might not be reproducible from one build to the other even on the same environment.

[1] eventually GCC added compilation modes with deterministic behaviors, but that was well after x87 was obsolete. In the meantime people had to do with -ffloat-store and or volatile. See https://gcc.gnu.org/wiki/FloatingPointMath.

edit: but you know this as you mentioned it elsethread.

zahlman•5mo ago

On the other hand, it feels wrong to me to call these "myths" when they are really just simplifications.

(And I have heard of the 80-bit internal register thing described in https://news.ycombinator.com/item?id=44888692 causing real problems for people before. And --ffast-math is basically spooky action at a distance considering how it bleeds into the entire program; see e.g. https://moyix.blogspot.com/2022/09/someones-been-messing-wit....)

exDM69•5mo ago

My favorite thing about floating point numbers: you can divide by zero. The result of x/0.0 is +/- inf (or NaN if x is zero). There's a helpful table in "weird floats" [0] that covers all the cases for division and a bunch of other arithmetic instructions.

This is especially useful when writing branchless or SIMD code. Adding a branch for checking against zero can have bad performance implications and it isn't even necessary in many cases.

Especially in graphics code I often see a zero check before a division and a fallback for the zero case. This often practically means "wait until numerical precision artifacts arise and then do something else". Often you could just choose the better of the two options you have instead of checking for zero.

Case in point: choosing the axes of your shadow map projection matrix. You have two options (world x axis or z axis), choose the better one (larger angle with viewing direction). Don't wait until the division goes to inf and then fall back to the other.

[0] https://www.cs.uaf.edu/2011/fall/cs301/lecture/11_09_weird_f...

thwarted•5mo ago

You still can't divide by zero, it just doesn't result in an error state that stops execution. The inf and NaN values are sentinel values that you still have to check for after the calculation to know if it went awry.

sixo•5mo ago

In the space of floats, you are dividing by zero. To map back to the space of numbers you have to check. It's nice, though; inf and NaN sentinels give you the behavior of a monadic `Result | Error` pipeline without having to wrap your numbers in another abstraction.

TehShrike•5mo ago

You can divide by zero, but you mayn't.

epcoa•5mo ago

If dividing by zero has a well defined result that doesn’t abort execution what exactly does “can’t” even mean?

Operations on those sentinel values are also defined. This can affect when checking needs to be done in optimized code.

bee_rider•5mo ago

I believe divide-by-zero produces an exception. The machine can either be configured to mask that exception, or not.

Personally, I am lazy, so I don’t check the mxcsr register before I start running my programs. Maybe gcc does something by default, I don’t know. IMO legitimate division by zero is rare but not impossible, so if you do it, the onus is on you to make sure the flags are set up right.

epcoa•5mo ago

Correct, divide by zero is one of the original five defined IEEE754-1985 exception. But the default behavior then and now is to produce that defined result mentioned and continue execution with a flag set ("default non-stop"). Further conforming implementations also allow "raiseNoFlag".

It's well-defined is all that really matters AFAIC.

mike_ivanov•5mo ago

This is the Result monad in practice. It allows you to postpone error handling until the computation is done.

wpollock•5mo ago

It has always amused me that Integer division by 0 results in "floating point exception", but floating point division by 0.0 doesn't!

exDM69•5mo ago

It's not always necessary to check for inf/NaN explicitly using isinf/isnan. Both inf and NaN are floating point values with well defined semantics.

I'll give two examples from a recent project where I very intentionally divided by zero. First one was about solving a zero in a derivative and check if it falls on 0..1 range. This exploits the fact that (x < NaN) is always false and comparisons with +/- inf behave as expected.

    float t = a / b; // might divide by zero. NaN if a and b == 0.0, +/- inf if b == 0.0
    if (t > 0.0 && t < 1.0) {
        // we don't get here if t is +/- inf or nan
        split_at(t); // do the thing
    }

The second one was similar, but clamping to 0..1 range using branchless simd min/max.

    f32x8 x = a / b; // might divide by zero
    return simd_min(simd_max(0.0, x), 1.0); // returns 0 for -inf or nan, 1 for +inf

In both of these cases, explicitly checking for division by zero or isinf/isnan would've been (worse than) useless because just using the inf/NaN values gave the correct answer for what comes next.

kccqzy•5mo ago

This is also my favorite thing about floating point numbers. Unfortunately languages like Python try to be smart and prevent me from doing it. Compare:

    >>> 1.0/0.0
    ZeroDivisionError
    >>> np.float64(1)/np.float64(0)
    inf

I'm so used to writing such zero division in other languages like C/C++ that this Python quirk still trips me up.

pklausler•5mo ago

Division by zero is an error and it should be treated as such. "Infinity" is an error indication from overflow and division by zero, nothing more.

ForceBru•5mo ago

The article's point 3 says that this is a myth. Indeed, the _limit_ of `1/x` as `x` approaches zero from the right is positive infinity. What's more, division by _negative zero_ (which, perhaps surprisingly, is a thing) yields negative infinity, which is also the value of the corresponding limit. If you divide a finite float by infinity, you get zero, because `lim_{x\to\infty} c/x=0`. In many cases you can treat division by zero or infinity as the appropriate limit.

pklausler•5mo ago

I am allowed to disagree with the article.

ForceBru•5mo ago

Sure, but it makes sense, doesn't it? Even `inf-inf == NaN` and `inf/inf == NaN`, which is true in calculus: limits like these are undefined, unless you use l'Hôpital's rule or something. (I know NaN isn't equal to itself, it's just for illustration purposes) But then again, you usually don't want these popping up in your code.

pklausler•5mo ago

In practice, though, I can't recall any HPC codes that want to use IEEE-754 infinities as valid data.

afiori•5mo ago

A significant chunk of floating point bitspace is dedicated to NaNs to represent explicitly invalid data (mostly I suppose to reduce the need for branches and tests in HPC) and NaNs even break the reflexivity of equality in many languages compared to them negative 0 and positive/negative infinity are perfectly valid

sfpotter•5mo ago

This is totally false mathematically. Please look up the extended real number system for an example. Many branches of mathematics affix infinity to some existing number system, extending its operations consistently, and do all kinds of useful things with this setup. Being able to work with infinity in exactly the same way in IEEE754 is crucial for being able to cleanly map algorithms from these domains onto a computer. If dividing by zero were an error in floating point arithmetic, I would be unable to do my job developing numerical methods.

layer8•5mo ago

Floating-point arithmetics adopted this from the extended reals (usually denoted as ℝ̅): https://en.wikipedia.org/wiki/Extended_real_number_line (see Arithmetic operations there)

wolvesechoes•5mo ago

> you can divide by zero

It is implementation-dependent. It is not obligatory for implementation to respect IEEE 754.

BlackFly•5mo ago

You can exploit the exactness of (specific) floating point operations in test data by using sums of powers of 2. Polynomials with such coefficients produce exact results so long as the overall powers are within ~53 powers of 2 (don't quote me exactly on that, I generally don't push the range very high!). You can find exact polynomial solutions to linear PDEs with such powers using high enough order finite difference methods for example.

However, the story about non-determinism is no myth. The intel processors have a separate math coprocessor that supports 80bit floats (https://en.wikipedia.org/wiki/Extended_precision#x86_extende...). Moving a float from a register in this coprocessor to memory truncates the float. Repeated math can be done inside this coprocessor to achieve higher precision so hot loops generally don't move floats outside of these registers. Non-determinism occurs in programs running on intel with floats when threads are interrupted and the math coprocessor flushed. The non-determinism isn't intrinsic to the floating point arithmetic but to the non-determinism of when this truncation may occur. This is more relevant for fields where chaotic dynamics occur. So the same program with the same inputs can produce different results.

NaN is an error. If you take the square root of a negative number you get a NaN. This is just a type error, use complex numbers to overcome this one. But then you get 0. / 0. and that's a NaN or Inf - Inf and a whole slew of other things that produce out of bounds results. Whether it is expected or not is another story, but it does mean that you are unable to represent the value with a float and that is a type error.

AshamedCaptain•5mo ago

> Non-determinism occurs in programs running on intel with floats when threads are interrupted and the math coprocessor flushed

That's ridiculous. No OS in his right mind would flush FPU regs to 64 bits only, because that would break many things, most obviously "real" 80 bit FP which is still a thing and the only reason x87 instructions still work. It would even break plain equality comparisons making all FP useless.

For 64 bit FP most compilers prefer SSE rather than x87 instructions these days.

dur-randir•5mo ago

https://bugs.php.net/bug.php?id=53632

Never, for sure.

bobmcnamara•5mo ago

> Non-determinism occurs in programs running on intel

FTFY. They even changed some of the more obscure handling between 8087,80287,80387. So much hoop jumping if you cared about binary reproducibility.

Seems to be largely fixed with targeting SSE even for scalar code now.

jcranmer•5mo ago

Wow, you're crossing a few wires in your zeal to provide information to the point that you're repeating myths.

> The intel processors have a separate math coprocessor that supports 80bit floats

x86 processors have two FPU units, the x87 unit (that you're describing) and the SSE unit. Anyone compiling for x86-64 uses the SSE unit for default, and most x86-32 compilers still default to SSE anyways.

> Moving a float from a register in this coprocessor to memory truncates the float.

No it doesn't. The x87 unit has load and store instructions for 32-bit, 64-bit, and 80-bit floats. If you want to spill 80-bit values as 80-bit values, you can do so.

> Repeated math can be done inside this coprocessor to achieve higher precision so hot loops generally don't move floats outside of these registers.

Hot loops these days use the SSE stuff because they're so much faster than x87. Friends don't let friends use long double without good reason!

> Non-determinism occurs in programs running on intel with floats when threads are interrupted and the math coprocessor flushed.

Lol, nope. You'll spill the x87 register stack on thread context switch with FSAVE or FXSAVE or XSAVE, all of which will store the registers as 80-bit values without loss of precision.

That said, there was a problem with programs that use the x87 unit, but it has absolutely nothing to do with what you're describing. The x87 unit doesn't have arithmetic for 32-bit and 64-bit values, only 80-bit values. Many compilers, though, just pretended that the x87 unit supported arithmetic on 32-bit and 64-bit values, so that FADD would simultaneously be a 32-bit addition, a 64-bit addition, and a 80-bit addition. If the compiler needed to spill a floating-point register, they would spill the value as a 32-bit value (if float) or 64-bit value (if double), and register spills are pretty unpredictable for user code. That's the nondeterminism you're referring to, and it's considered a bug in every compiler I'm aware of. (See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p37... for a more thorough description of the problem).

BlackFly•5mo ago

It isn't zeal, it's 15 years past hazy memory of getting different results on different executions in the same supercomputer. The story that went around was the one I relayed, but certainly your link does a better job explaining things that happen in the user perspective section:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p37...

Summarized as,

> Most users cannot be expected to know all of the ways that their floating-point code is not reproducible.

Glad to know that the situation is defaulting to SSE2 nowadays though.

mturmon•5mo ago

Thanks for the link, it was very informative.

I passed through the highly-irreproducible eras described in the section you link, and that you summarize in your last paragraph. There was so much different FP hardware, and so many different niche compilers, that my takeaway became “you can’t rely on reproducibility across any hardware/os version/compiler/library difference”.

There are still issues with libraries and compilers, summarized farther down (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p37...). And these can be observed in the wild when changing compilers on the same underlying hardware.

But your point is that irreproducibility at the level of interrupts or processor scheduling is not a thing on contemporary mainstream hardware. That’s important and I hadn’t realized that.

nyeah•5mo ago

NaN is not necessarily an error. It might be fine. It depends on what you're doing with it.

If NaN is invalid input for the next step, then sure why not treat it as an error? But that's a design decision not an imperative that everybody must follow. (I picture Mel Brooks' 15 commandments, #11-15, that fell and broke. This is not like that.)

kccqzy•5mo ago

Yes it is totally fine. I've seen code basically treating NaN as missing data as opposed to std::optional<double> or equivalent in your language. NaN propagates so this works like using such types as a monad.

BlackFly•5mo ago

Sure, not every runtime type error needs to panic your application, nor even panic a request handler, nor even result in a failed handling. That doesn't mean you didn't encounter an error though. Error doesn't mean fatal.

The design decision isn't, "Was this a type error?" but, "What do I need to do about this type error?"

nyeah•5mo ago

“BRITANNUS (shocked). Caesar, this is a type error.

THEODOTUS (outraged). How!

CAESAR (recovering his self-possession). Pardon him. Theodotus: he is a barbarian, and thinks that the customs of his tribe and island are the definition of the float type.”

quietbritishjim•5mo ago

When writing code, it's useful to follow simpler rules than strictly needed to make it easier to read for other coders (including you in the future). For example, you might add some unnecessary brackets to expressions to avoid remembering all the operator precedence rules.

The article is right that sometimes you can can use simple equality on floating point values. But if you have a (almost) blanket rule to use fuzzy comparison, or (even better) just avoid any sort if equality-like comparison altogether, then your code might be simpler to understand and you might be less likely to make a mistake about when you really can safely do that.

It's still sensible to understand that it's possible though. Just that you should try to avoid writing tricky code that depends on it.

zokier•5mo ago

That sort of argument can be problematic though, because the code can become misleading. If I see a fuzzy comparison then I'd assume it is there because it is needed, which might make the rest of the code more difficult to understand/modify because I then have to assume that everywhere else the values might be fuzzy.

kccqzy•5mo ago

Every time I use exact floating point equality I simply write a comment explaining why I did this. That's what comments are for.

glkindlmann•5mo ago

Note: the improved loop in the "1. They are not exact" can easily hang. If count > 2^24, then the ulp of f is 2, and adding 1.0f leaves f unchanged. What's wild is that a few lines later he notes how above 2^24 numbers start “jumping” every 2. Ok, but FP are always "jumping", by their ulp, regardless of how big or small they are.

pklausler•5mo ago

People need to understand rounding better, especially the topic of when rounding can happen and when it can't for the basic operations.

Updating with a concrete example: The Fortran standard defines the real MOD and MODULO intrinsic functions as being equivalent to the straightforward sequence of a division, conversion to integer, multiplication, and subtraction. This formula can round, obviously. But MOD can be (and thus should be) implemented exactly by other means, and most Fortran compilers do so instead. This leaves Fortran implementors in a bit of a pickle -- conform to the standard, or produce good results?

camgunz•5mo ago

Definitely. People think they'll get out of knowing how rounding works by using arbitrary precision arithmetic, but arguably it's even more important there (you run out of precision/memory at some point; what do you think happens then?). You can use floats for money if you do the rounding right.

AtlasBarfed•5mo ago

The core thing to know about floating point numbers in comp langs is they aren't floating point numbers.

They are approximations of floating point numbers, and even if your current approximation of a floating point number to represent a value seems to be accurate as an integer vale...

There is no guarantee if you take two of those floating point number approximations that appear completely accurate that the resulting operation between them will also be completely accurate.

kccqzy•5mo ago

That's not a useful way to think. Floating point numbers are just floating point numbers. You aren't approximating floating point numbers. You are approximating real numbers.

Floating point numbers are compared against fixed point numbers where the point (the part between the integer part and fractional part) is fixed. That is why they are called floating. The nature of the real numbers is such that they can in general only be approximated, regardless of whether you use fixed point numbers or floating point numbers or a fancier computable number representation.

ogogmad•5mo ago

The real numbers can be "represented" and "computed with" without needing approximations. The result is that there are no rounding errors. But it's not popular, and not necessarily worth the tradeoffs. On the other hand, see the Android calculator.

kccqzy•5mo ago

The Android calculator doesn't support all real numbers. It supports a specific subset of computable numbers that are computable using the operations presented in the calculator app. It's a good domain-specific number representation. It's far from real numbers.

We can't even represent all the natural numbers: that would require infinite memory. Now you want to represent the real numbers, equinumerous with the power set of natural numbers?

ogogmad•5mo ago

> We can't even represent all the natural numbers

We can represent all real numbers to the same extent that we can represent all natural numbers. They're just infinite strings, with each additional letter in the string mattering less and less as it goes along. The "Type Two Effectivity" model for modelling computations with infinite strings doesn't limit you to using just the computable real numbers. An uncomputable real number can be produced by outputting an infinite string letter by letter with each letter getting picked at random. The TTE model does OTOH limit you to computing only with the computable functions on those real numbers.

The TTE model basically uses (Python-like) generators to output an endless sequence of interval approximations to a real number.

zahlman•5mo ago

> An uncomputable real number can be produced by outputting an infinite string letter by letter with each letter getting picked at random.

This certainly outputs an uncomputable real number, given that you have true random input and not a PRNG.

But random numbers from a seed and an algorithm are by definition computable.

And if you can't compute the number, you have no basis for the claim that it's the right number.

Also, good luck computing with arbitrarily long, computed-on-demand digit strings. Consider: how do you know how many digits you need from the input in order to be sure of N correct digits in the output? This might be possible in general, but it certainly doesn't sound like my idea of fun.

Y_Y•5mo ago

Georg Cantor would like a word.

Representable numbers are countable, they have measure zero in the reals.

kccqzy•5mo ago

Let's just assume we have infinite memory and can represent real numbers using infinitely long strings. How do you compare two such numbers for equality? You can check that the first 1000 digits are the same but what about the 1001st digit? Suddenly you need infinite time to determine equality. That's not an effective representation.

AtlasBarfed•5mo ago

So what you're saying is I can use floating point math in most computer languages for financial calculations?

Because anyone who's had any experience knows you absolutely can't.

I am explicitly excluding big decimal type functionality here. What we are talking about explicitly are floats and doubles.

kccqzy•5mo ago

I'm not saying that at all. Where did I say financial calculations in the above comment? I'm just explaining the concept of floating point; I didn't say anything about its applications.

ivankra•5mo ago

My favorite trick: NaN boxing. NaN's aren't just for errors, but also for smuggling other data inside. For a double, you have whopping 53 bits of payload, enough to cram in a pointer and maybe a type tag, and many javascript engines do (since JS numbers are double's after all)

https://wingolog.org/archives/2011/05/18/value-representatio...

https://piotrduperas.com/posts/nan-boxing

pklausler•5mo ago

52 bits of payload, and at least one bit must be set.

ivankra•5mo ago

You can put stuff into the sign bit too, that makes 53. Yeah, the lower 52 bits can't all be zero - that'd be ±INF, but the other 2^53-2 values are all yours to use.

pklausler•5mo ago

It's possible for the sign bit of a NaN to be changed by a "non-arithmetic" operation that doesn't trap on the NaN, so don't put anything precious in there.

Arnavion•5mo ago

It is also how RISC-V floating point registers are required to store floats of smaller widths. Eg if your CPU supports 64-bit floats (D extension), its FPU registers will be 64-bit wide. If you use an instruction to load a 16-bit float (Zfh extension) into such a register, it will be boxed into a negative quiet NaN with all bits above the lower 16 bits set to 1.

calibas•5mo ago

> They are not exact

It's not exactly a myth, as the article mentions, they're only exact for certain ranges of values.

> NaN and INF are indication of an error

This is somewhat semantic, but dividing by zero typically does create a hardware exception. However, it's all handled behind the scenes, and you get "Inf" as the result.

You can make it so dividing by zero is explicitly an error, see the "ftrapping-math" flag.

GuB-42•5mo ago

I think the myth about exactness is that you can't use strict equality with floating point numbers because they are somehow fuzzy. They are not.

Some operations involve rounding, notably conversion from decimal, but the (rounded) result is an exact number that can be stored and recovered without loss of precision and equality will work as expected. Converting from floating point binary to decimal is exact though, given enough digits (can be >1000 for very small or very large numbers).

Joker_vD•5mo ago

> you can't use strict equality with floating point numbers because they are somehow fuzzy. They are not.

They are though. All arithmetic operations involve rounding, so e.g. (7.0 / 1234 + 0.5) * 1234 is not equal to 7.0 + 617 (it differs in 1 ULP). On the other hand, (9.0 / 1234 + 0.5) * 1234 is equal to 9.0 + 617, so the end result is sometimes exact and sometimes is not. How can you know beforehand which one is the case in your specific case? Generally, you can't, any arithmetic operation can potentially give you 1 ULP of error, and it can (and likely, will) slowly accumulate.

AndriyKunitsyn•5mo ago

Here's one that's not a myth: IEEE-754 floats are the only "primitive types" that allow "a == a" to not be true.

I.e., two floats that are _identical_ to each other (even when it's _the same_ variable, on the same memory address) can be not _equal_ to each other, specifically if it's NaN. This is dictated by IEEE-754, and this is true for all programming languages I know, and to this day, this makes zero sense to me, but apparently this is useful for some reason.

zahlman•5mo ago

>but apparently this is useful for some reason.

It lets you easily test whether a value is NaN without needing a library function call. (Even if the language wanted to provide a NaN literal, there's more than one NaN value, so that wouldn't actually work!)

AndriyKunitsyn•5mo ago

This test is better with a function call, because it clearly shows the intent. Perf-wise, this function call will be inlined by any decent compiler. https://godbolt.org/z/chhM85vP3

Breaking a basic programming expectation for a specific class of values of a specific type is not a good thing, in my opinion.

analog31•5mo ago

>>> A beginner would use them, trusting they are infinitely capable and precise, which can lead to problems. An intermediate programmer knows that they have some limitations, and so by using some good practices the problems can be avoided.

Amusingly, pocket calculators became affordable while I was in high school, and anybody who was interested in math learned the foibles of floating point almost instantly. Now it's an advanced topic. A difference may have been that our calculators had very few significant digits, so the errors introduced by successive calculations became noticeable more quickly. Also, you had to think about what you were doing because, even if you trusted the calculator, you didn't trust your fingers.

pclmulqdq•5mo ago

Modern pocket calculators now use some very fancy math that makes them have much less error than if they used floating point. Many of them will use arbitrary-precision arithmetic at minimum, and there are some fancier schemes.

fiforpg•5mo ago

In the context of double precision the article says

> the largest integer value that can be represented exactly is 2^53

— I am confused as to why it not 2^52, given that there are 52 bits of mantissa, so relative accuracy is 2^-52, which translates to absolute accuracy larger than 1 after 2^52. Compare this to the table there saying "Next value after 1 = 1 + 2^-52".

pclmulqdq•5mo ago

There's an implied one bit, so you actually have a 53 bit significand (and 53-bit precision) given only a 52 bit mantissa.

fiforpg•5mo ago

Right, I did realize after posting that close to numbers of the form

1{hidden bit} + (1-2^-52){mantissa with all ones}

the relative accuracy — corresponding to the absolute accuracy of a single bit in mantissa — is about 2^-53. The hidden bit is easy to forget about...

layer8•5mo ago

https://stackoverflow.com/questions/1848700/biggest-integer-...

layer8•5mo ago

Archive of the defunct tweet link: https://web.archive.org/web/20210608061416/https://twitter.c...

zX41ZdbW•5mo ago

This is a great article!

A few more resources on this topic:

https://randomascii.wordpress.com/category/floating-point/ - a series of articles about floating point, with deeper details.

https://float.exposed/ - explorer of floating point values by Bartosz Ciechanowski.

zX41ZdbW•5mo ago

Fun facts about floating point, from my practice:

Obvious: An array containing NaNs will almost certainly be sorted incorrectly with std::sort.

Non-obvious: sorting an array containing NaNs with std::sort from libstdc++ (gcc's implementation) could lead to a buffer overflow.

Non-obvious: using MMX instructions without clearing the FPU state leads to incorrect results of floating-point instructions in other parts of the code base: https://github.com/simdjson/simdjson/issues/169

extraduder_ire•5mo ago

I love that float.exposed website. Really nails home a point I saw made a few years ago about how there's only a few billion possible floats, so you can often test them all.

vismit2000•5mo ago

A famous old interview question regarding floating points: https://stackoverflow.com/questions/6699066/in-which-order-s...

This awesome article drives home the point very succinctly: https://fabiensanglard.net/floating_point_visually_explained...

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

Cycling in France

Ask HN: What breaks in cross-border healthcare coordination?

Show HN: Simple – a bytecode VM and language stack I built with AI

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

My Eighth Year as a Bootstrapped Founde

Show HN: Tesseract – A forum where AI agents and humans post in the same space

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

OpenAI is Broke ... and so is everyone else [video][10M]

We interfaced single-threaded C++ with multi-threaded Rust

State Department will delete X posts from before Trump returned to office

AI Skills Marketplace

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

eInk UI Components in CSS

Discuss – Do AI agents deserve all the hype they are getting?

ChatGPT is changing how we ask stupid questions

Zig Package Manager Enhancements

Neutron Scans Reveal Hidden Water in Martian Meteorite

Deepfaking Orson Welles's Mangled Masterpiece

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon

Jeremy Wade's Mighty Rivers

Show HN: MCP App to play backgammon with your LLM

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

Cycling in France

Ask HN: What breaks in cross-border healthcare coordination?

Show HN: Simple – a bytecode VM and language stack I built with AI

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

My Eighth Year as a Bootstrapped Founde

Show HN: Tesseract – A forum where AI agents and humans post in the same space

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

OpenAI is Broke ... and so is everyone else [video][10M]

We interfaced single-threaded C++ with multi-threaded Rust

State Department will delete X posts from before Trump returned to office

AI Skills Marketplace

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

eInk UI Components in CSS

Discuss – Do AI agents deserve all the hype they are getting?

ChatGPT is changing how we ask stupid questions

Zig Package Manager Enhancements

Neutron Scans Reveal Hidden Water in Martian Meteorite

Deepfaking Orson Welles's Mangled Masterpiece

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon

Jeremy Wade's Mighty Rivers

Show HN: MCP App to play backgammon with your LLM

Myths About Floating-Point Numbers (2021)

Comments