These operations are
1. Localized, not a function-wide or program-wide flag.
2. Completely safe, -ffast-math includes assumptions such that there are no NaNs, and violating that is undefined behavior.
So what do these algebraic operations do? Well, one by itself doesn't do much of anything compared to a regular operation. But a sequence of them is allowed to be transformed using optimizations which are algebraically justified, as-if all operations are done using real arithmetic.
This can be expanded in the future as LLVM offers more flags that fall within the scope of algebraically motivated optimizations.
('Naming: "algebraic" is not very descriptive of what this does since the operations themselves are algebraic.' :D)
Okay, the floating point operations are literally algebraic (they form an algebra) but they don't follow some common algebraic properties like associativity. The linked tracking issue itself acknowledges that:
> Naming: "algebraic" is not very descriptive of what this does since the operations themselves are algebraic.
Also this comment https://github.com/rust-lang/rust/issues/136469#issuecomment...
> > On that note I added an unresolved question for naming since algebraic isn't the most clear indicator of what is going on. > > I think it is fairly clear. The operations allow algebraically justified optimizations, as-if the arithmetic was real arithmetic. > > I don't think you're going to find a clearer name, but feel free to provide suggestions. One alternative one might consider is real_add, real_sub, etc.
Then retorted here https://github.com/rust-lang/rust/issues/136469#issuecomment...
> These names suggest that the operations are more accurate than normal, where really they are less accurate. One might misinterpret that these are infinite-precision operations (perhaps with rounding after a whole sequence of operations). > > The actual meaning isn't that these are real number operations, it's quite the opposite: they have best-effort precision with no strict guarantees. > > I find "algebraic" confusing for the same reason. > > How about approximate_add, approximate_sub?
And the next comment
> Saying "approximate" feels imperfect, as while these operations don't promise to produce the exact IEEE result on a per-operation basis, the overall result might well be more accurate algebraically. E.g.: > > (...)
So there's a discussion going on about the naming
(Is this going to overload operators or are people going to have to type this… a lot… ?)
I wonder if it is possible to add an additional constraint that guarantees the transformation has equal or fewer numerical rounding errors. E.g. for floating point doubles (0.2 + 0.1) - 0.1 results in 0.20000000000000004, so I would expect that transforming some (A + B) - B to just A would always reduce numerical error. OTOH, it's floating point maths, there's probably some kind of weird gotcha here as well.
2 f 1 f 1 f f+ f- f. 0.000 ok
PFE, I think reusing the GLIBC math library:
2e0 1e0 1e0 f+ f- f. 0.000000 ok
In general, rules that allow fewer transformations are probably easier to understand and use. Trying to optimize everything is where you run into trouble.
The library has some merit, but the goal you've stated here is given to you with 5 compiler flags. The benefit of the library is choosing when these apply.
A similar warning applies to -O3. If an optimization in -O3 were to reliably always give better results, it wouldn't be in -O3; it'd be in -O2. So blindly compiling with -O3 also doesn't seem like a great idea.
-Ofast is the 'dangerous' one. (It includes -ffast-math).
I didn't mean to imply that they result in incorrect results.
> they make a more aggressive space/speed tradeoff...
Right...so "better" becomes subjective, depends on the use case, so it doesn't make sense to choose -O3 blindly unless you understand the trade-offs and want that side of them for the particular builds you're doing. Things that everyone wants would be in -O2. That's all I'm saying.
If you know your exact target and details about your input expectations, of course you can optimize further, which might involve turning off some things in -O3 (or even -O2). On a whole bunch of systems, -Os can be faster than -O3 due to I-cache size limits. But at-large, you can expect -O3 to be faster.
Similar considerations apply for LTO and PGO. LTO is commonly default for release builds these days, it just costs a whole lot of compile time. PGO is done when possible (i.e. known majority inputs).
Previous discussion: Beware of fast-math (Nov 12, 2021, https://news.ycombinator.com/item?id=29201473)
EDIT: I am now reading Goldberg 1991
Double edit: Kahan Summation formula. Goldberg is always worth going back to.
What's wrong with fun, safe math optimizations?!
(:
I guess that happens when I don’t deal with compiler flags daily.
Not being able to auto-vectorize seems like a pretty critical bug given hardware trends that have been going on for decades now; on the other hand sacrificing platform-independent determinism isn't a trivial cost to pay either.
I'm not familiar with the details of OpenCL and CUDA on this front - do they have some way to guarrantee a specific order-of-operations such that code always has a predictable result on all platforms and nevertheless parallelizes well on a GPU?
It would be pretty ironic if at some point fixed point / bignum implementations end up being faster because of this.
That can’t be assumed.
You can easily fall into a situation like:
total = large_float_value
for _ in range(1_000_000_000):
total += .01
assert total == large_float_value
Without knowing the specific situation, it’s impossible to say whether that’s a tolerably small difference.This also adds extra complexity to the CPU. you need special hardware for == rather than just using the perfectly good integer unit, and every fpu operation needs to devote a bunch of transistors to handling this nonsense that buys the user absolutely nothing.
there are definitely things to criticize about the design of Posits, but the thing they 100% get right is having a single NaN and sane ordering semantics
Well, all standards are bad when you really get into them, sure.
But no, the problem here is that floating point code is often sensitive to precision errors. Relying on rigorous adherence to a specification doesn't fix precision errors, but it does guarantee that software behavior in the face of them is deterministic. Which 90%+ of the time is enough to let you ignore the problem as a "tuning" thing.
But no, precision errors are bugs. And the proper treatment for bugs is to fix the bugs and not ignore them via tricks with determinism. But that's hard, as it often involves design decisions and complicated math (consider gimbal lock: "fixing" that requires understanding quaternions or some other orthogonal orientation space, and that's hard!).
So we just deal with it. But IMHO --ffast-math is more good than bad, and projects should absolutely enable it, because the "problems" it discovers are bugs you want to fix anyway.
Or just avoiding gimbal lock by other means. We went to the moon using Euler angles, but I don't suppose there's much of a choice when you're using real mechanical gimbals.
FWIW, my memory is that this was exactly what happened with Apollo 13. It lost its gyro calibration after the accident (it did the thing that was the "just don't do that") and they had to do a bunch of iterative contortions to recover it from things like the sun position (because they couldn't see stars out the iced-over windows).
NASA would have strongly preferred IEEE doubles and quaternions, in hindsight.
Most popular programming languages have the defect that they impose a sequential semantics even where it is not needed. There have been programming languages without this defect, e.g. Occam, but they have not become widespread.
Because nowadays only a relatively small number of users care about computational applications, this defect has not been corrected in any mainline programming language, though for some programming languages there are extensions that can achieve this effect, e.g. OpenMP for C/C++ and Fortran. CUDA is similar to OpenMP, even if it has a very different syntax.
The IEEE standard for floating-point arithmetic has been one of the most useful standards in all history. The reason is that both hardware designers and naive programmers have always had the incentive to cheat in order to obtain better results in speed benchmarks, i.e. to introduce errors in the results with the hope that this will not matter for users, which will be more impressed by the great benchmark results.
There are always users who need correct results more than anything else and it can be even a matter of life and death. For the very limited in scope uses where correctness does not matter, i.e. mainly graphics and ML/AI, it is better to use dedicated accelerators, GPUs and NPUs, which are designed by prioritizing speed over correctness. For general-purpose CPUs, being not fully-compliant with the IEEE standard is a serious mistake, because in most cases the consequences of such a choice are impossible to predict, especially not by the people without experience in floating-point computation who are the most likely to attempt to bypass the standard.
Regarding CUDA, OpenMP and the like, by definition if some operations are parallelizable, then the order of their execution does not matter. If the order matters, then it is impossible to provide guarantees about the results, on any platform. If the order matters, it is the responsibility of the programmer to enforce it, by synchronization of the parallel threads, wherever necessary.
Whoever wants vectorized code should never rely on programming languages like C/C++ and the like, but they should always use one of the programming language extensions that have been developed for this purpose, e.g. OpenMP, CUDA, OpenCL, where vectorization is not left to chance.
Whether it's the standards fault or the languages fault for following the standard in terms of preventing auto-vectorization is splitting hairs; the whole point of the standard is to have predictable and usually fairly low-error ways of performing these operations, which only works when the order of operations is defined. That very aim is the problem; to the extent the stardard is harmless when ordering guarrantees don't exist you're essentially applying some of those tricky -ffast-math suboptimizations.
But to be clear in any case: there are obviously cases whereby order-of-operations is relevant enough and accuracy altering reorderings are not valid. It's just that those are rare enough that for many of these features I'd much prefer that to be the opt-in behavior, not opt-out. There's absolutely nothing wrong with having a classic IEEE 754 mode and I expect it's an essentialy feature in some niche corner cases.
However, given the obviously huge application of massively parallel processors and algorithms that accept rounding errors (or sometimes conversely overly precise results!), clearly most software is willing to generally accept rounding errors to be able to run efficiently on modern chips. It just so happens that none of the computer languages that rely on mapping floats to IEEE 754 floats in a straitforward fashion are any good at that, which is seems like its a bad trade off.
There could be multiple types of floats instead; or code-local flags that delineate special sections that need precise ordering; or perhaps even expressions that clarify how much error the user is willing to accept and then just let the compiler do some but not all transformations; and perhaps even other solutions.
We have memory ordering functions to let compilers know the atomic operation preference of the programmer… couldn’t we do the same for maths and in general a set of expressions?
This is just a minor change from the syntax of the most popular programming languages, because they typically already specify that the order of evaluation of the expressions used for the arguments of a function, which are separated by commas, can be arbitrary.
Early in its history, the C language has been close to specifying this behavior for its comma operator, but unfortunately its designers have changed their mind and they have made the comma operator behave like a semicolon, in order to be able to use it inside for statement headers, where the semicolons have a different meaning. A much better solution for C, instead of making both comma and semicolon to have the same behavior, would have been to allow a block to appear in any place where an expression is expected, giving it the value of the last expression evaluated in the block.
AFAIK GPU code is basically always written as scalar code acting on each "thing" separately, that's, as a whole, semantically looped over by the hardware, same way as multithreading would (i.e. no order guaranteed at all), so you physically cannot write code that'd need operation reordering to vectorize. You just can't write an equivalent to "for (each element in list) accumulator += element;" (or, well, you can, by writing that and running just one thread of it, but that's gonna be slower than even the non-vectorized CPU equivalent (assuming the driver respects IEEE-754)).
This is slightly obfuscated by not using a keyword like "for" or "do", by the fact that the body of the loop (the "kernel") is written in one place and and the header of the loop (which gives the ranges for the loop indices) is written in another place, and by the fact that the loop indices have standard names.
A "parallel for" may have as well a syntax identical with a sequential "for". The difference is that for the "parallel for" the compiler knows that the iterations are independent, so they may be scheduled to be executed concurrently.
NVIDIA has been always greatly annoying by inventing a huge amount of new terms that are just new words for old terms that have been used for decades in the computing literature, with no apparent purpose except of obfuscating how their GPUs really work. Worse, AMD has imitated NVIDIA, by inventing their own terms that correspond to those used by NVIDIA, but they are once again different.
I particularly find the discussion of - fassociative-math because I assume that most writers of some code that translates a mathetical formula to into simulations will not know which would be the most accurate order of operations and will simply codify their derivation of the equation to be simulated (which could have operations in any order). So if this switch changes your results it probably means that you should have a long hard look at the equations you're simulating and which ordering will give you the most correct results.
That said I appreciate that the considerations might be quite different for libraries and in particular simulations for mathematics.
Then all other math will be fast-math, except where annotated.
Not sure how that interacts with this fast math thing, I don't use C
Imagine a function like Python’s `sum(list)`. In abstract, Python should be able to add those values in any order it wants. Maybe it could spawn a thread so that one process sums the first half in the list, another sums the second half at the same time, and then you return the sum of those intermediate values. You could imagine a clever `sum()` being many times faster, especially using SIMD instructions or a GPU or something.
But alas, you can’t optimize like that with common IEEE-754 floats and expect to get the same answer out as when using the simple one-at-a-time addition. The result depends on what order you add the numbers together. Order them differently and you very well may get a different answer.
That’s the kind of ordering we’re talking about here.
I find the robotics example quite surprising in particular. I think the precision of most input sensors is less than 16bit so. If your inputs have this much noise on them how come you need so much precision your calculations?
ffast-math is sacrificing both the first and the second for performance. Compilers usually sacrifice the first for the second by default with things like automation fma contraction. This isn't a necessary trade-off, it's just easier.
There's very few cases where you actually need accuracy down to the ULP though. No robot can do anything meaningful with femtometer+ precision, for example. Instead you choose a development balance between reproducibility (relatively easy) and accuracy (extremely hard). In robotics, that will usually swing a bit towards reproducibility. CAD would swing more towards accuracy.
I work in audio software and we have some comparison tests that compare the audio output of a chain of audio effects with a previous result. If we make some small refactoring of the code and the compiler decides to re-organize the arithmetic operations then we might suddenly get a slightly different output. So of course we disable fast-math.
One thing we do enable though, is flushing denormals to zero. That is predictable behavior and it saves some execution time.
https://www.jviotti.com/2017/12/05/an-introduction-to-adas-s...
http://www.ada-auth.org/standards/22rm/html/RM-3-5-7.html
http://www.ada-auth.org/standards/22rm/html/RM-A-5-3.html
Ada also has fixed point types:
> This is perhaps the single most frequent cause of fast-math-related StackOverflow questions and GitHub bug reports
The second line above should settle the first.
Is there any IEEE standards committee working on FP alternative for examples Unum and Posit [1],[2].
[1] Unum & Posit:
[2] The End of Error:
https://www.oreilly.com/library/view/the-end-of/978148223986...
You can think of fixed point as equivalent to ieee754 floats with a fixed exponent and a two’s complement mantissa instead of a sign bit.
I always have a wrapper class to put the logic of converting to whole currency units when and if needed, as well as when requirements change and now you need 4 digits past the decimal instead of 2, etc.
Should I be running my accounting system on units of 10 billionths of a dollar?
: gcd begin dup while tuck mod repeat drop ;
: lcm 2dup \* abs -rot gcd / ;
: reduce 2dup gcd tuck / >r / r> ;
: q+ rot 2dup \* >r rot \* -rot \* + r> reduce ;
: q- swap negate swap q+ ;
: q\* rot \* >r \* r> reduce ;
: q/ >r \* swap r> \* swap reduce ;
Example: to compute 70 * 0.25 = 35/270 1 1 4 q* reduce .s 35 2 ok
On stack managing words like 2dup, rot and such, these are easily grasped under either Google/DDG or any Forth with the words "see" and/or "help".
as a hint, q- swaps the top two numbers in the stack, (which compose a rational), makes the last one negative and then turns back its position. And then it calls q+.
So, 2/5 - 3/2 = 2/5 + -3/2.
But you probably should run your billing in fixed point or floating decimals with a billionth of a dollar precision, yes. Either that or you should consolidate the expenses into larger bunches.
https://ethereum.stackexchange.com/questions/158517/does-sol...
As an added benefit, it makes it much easier to deal with price changes.
Using arbitrary precision doesn't make sense if the data needs to be stored in a database (for most situations at least). Regardless, infinite precision is magical thinking anyway: try adding Pi to your bank account without loss of precision.
Fixed point is a general technique that is commonly done with machine integers when the necessary precision is known at compile time. It is frequently used on embedded devices that don't have a floating point unit to avoid slow software based floating point implementations. Limiting the precision to $0.01 makes sense if you only do addition or subtraction. Precision of $0.001 (Tenths of a cent also called mils) may be necessary when calculating taxes or applying other percentages although this is typically called out in the relevant laws or regulations.
The decimal number 0.1 has an infinitely repeating binary fraction.
Consider how 1/3 in decimal is 0.33333… If you truncate that to some finite prefix, you no longer have 1/3. Now let’s suppose we know, in some context, that we’ll only ever have a finite number of digits — let’s say 5 digits after the decimal point. Then, if someone asks “what fraction is equivalent to 0.33333?”, then it is reasonable to reply with “1/3”. That might sound like we’re lying, but remember that we agreed that, in this context of discussion, we have a finite number of digits — so the value 1/3 outside of this context has no way of being represented faithfully inside this context, so we can only assume that the person is asking about the nearest approximation of “1/3 as it means outside this context”. If the person asking feels lied to, that’s on them for not keeping the base assumptions straight.
So back to floating point, and the case of 0.1 represented as 64 bit floating point number. In base 2, the decimal number 0.1 looks like 0.0001100110011… (the 0011 being repeated infinitely). But we don’t have an infinite number of digits. The finite truncation of that is the closest we can get to the decimal number 0.1, and by the same rationale as earlier (where I said that equating 1/3 with 0.33333 is reasonable), your programming language will likely parse “0.1” as a f64 and print it back out as such. However, if you try something like (a=0.1; a+a+a) you’ll likely be surprised at what you find.
I very much doubt it. My day job is writing symbolic-numeric code. The result of 0.1+0.1+0.1 != 0.3, but for rounding to bring it up to 0.31 (i.e. rounding causing an error of 1 cent), you would need to accumulate at least .005 error, which will not happen unless you lose 13 out of your 16 digits of precision, which will not happen unless you do something incredibly stupid.
Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 * 55 which is close to but not exactly equal to the true value of 1/10.
Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display:
0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead: 1 / 10
0.1
That being said, double should be fine unless you're aggregating trillions of low cost transactions. (API calls?) >>> from decimal import Decimal
>>> Decimal(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
$ python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 0.1
>>> a + a + a
0.30000000000000004
By way of explanation, the algorithm used to render a floating point number to text used in most languages these days is to find the shortest string representation that will parse back to an identical bit pattern. This has the direct effect of causing a REPL to print what you typed in. (Well, within certain ranges of "reasonable" inputs.) But this doesn't mean that the language stores what you typed in - just an approximation of it.Edit: Now it does it fine after inputting floats:
puts [ expr { 1.0/7.0 } ]
Eforth on top of Subleq, a very small and dumb virtual machine:
1 f 7 f f/ f.
0.143 ok
Still, using rationals where possible (and mod operations otherwise) gives a great 'precision', except for irrationals.I interpreted "directly representable" as "uniquely representable", all < 15 digit decimals are uniquely represented in fp64 so it is always safe to roundtrip between those decimals <-> f64, though indeed this guarantee is lost once you perform any math.
To round to the nearest cent, you would need to make cents your units (i.e. the quantity "1 dollar" would be represented as 100 instead of 1.0).
You don't need to have a representation of the exact number 0.1 if you can tolerate errors after the 7th decimal (and it turns out you can). And 0.1+0.1+0.1 does not have to be comparable with 0.3 using operator==. You have an is_close function for that. And accumulation is not an issue because you have rounding and fma for that.
First of all, a lot of languages don't include arbitrary rounding in their math libraries at all, only having rounding to integers. Second, in the docs of Python, which does have arbitrary rounding, it specifically says:
Note: The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. [...]
Thus I think what I said stands: you cannot round to nearest cent reliably all the time, assuming cent means 0.01. The only rounding you can sort of trust is display rounding because it actually happens after converting to base 10. It's why 2.675 will print as 2.675 in Python even though it won't round as you'd expect. But you'd only do that once at the end of a chain of operations.In a lot of cases, errors like these don't matter, but the key point is that if the errors don't matter, then they don't need to be "assured" away by dubious rounding either.
https://learn.microsoft.com/en-us/office/troubleshoot/excel/...
(It nevertheless happens to work just fine for most of what Excel is used for.)
I've spent most of my career writing trading systems that have executed 100's of billions of dollars worth of trades, and have never had any floating point related bugs.
Using some kind of fixed point math would be entirely inappropriate for most HFT or scientific computing applications.
It's the associativity law that it fails to uphold.
If you need to be extremely fast (like fpga fast), you don't waste compute transforming their fixed point representation into floating.
With fixed point and at least 2 decimal places, 10.01 + 0.01 is always exactly equal to 10.02. But with FP you may end up with something like 10.0199999999, and then you have to be extra careful anywhere you convert that to a string that it doesn't get truncated to 10.01. That could be logging (not great but maybe not the end of the world if that goes wrong), or you could be generating an order message and then it is a real problem. And either way, you have to take care every time you do that, as opposed to solving the problem once at the source, in the way the value is represented.
> Using some kind of fixed point math would be entirely inappropriate for most HFT or scientific computing applications.
In the case of HFT, this would have to depend very greatly on the particulars. I know the systems I write are almost never limited by arithmetical operations, either FP or integer.
The other "metal model" issue is that associative operations in math. Adding a + (b + c) != (a + b) + c due to rounding. This is where fp-precise vs fp-fast comes in. Let's not talk about 80 bit registers (though that used to be another thing to think about).
if (ask - bid > 0.01) {
// etc
}
With floating point, I have to think about the following questions:
* What if the constant 0.01 is actually slightly greater than mathematical 0.01?
* What if the constant 0.01 is actually slightly less than mathematical 0.01?
* What if ask - bid is actually slightly greater than the mathematical result?
* What if ask - bid is actually slightly less than the mathematical result?With floating point, that seemingly obvious code is anything but. With fixed point, you have none of those problems.
Granted, this only works for things that are priced in specific denominations (typically hundredths, thousandths, or ten thousandths), which is most securities.
In this example, I’m talking about securities that are priced in whole cents. If you represent prices as floats, then it’s possible that the spread appears to be less (or greater) than 0.01 when it’s actually not, due to the inability of floats to exactly represent most real numbers.
A different example: let's say that you're trying to buy some security, and you've determined that the maximum price you can pay and still be profitable is 10.01. If you mistakenly use an order price of 10.00, you'll probably get fewer shares than you wanted, possibly none. If you mistakenly use a price of 10.02, you may end up paying too much and then that trade ends up not being profitable. If you use a price of 10.0199999 (assuming it's even possible to represent such a price via whatever protocol you're using), either your broker or the exchange will likely reject the order for having an invalid price.
May I ask why? (generally curious)
I guess I understood GGGGP's comment about using fixed point for interacting with currency to be about accounting. I'd expect floating point to be used for trading algorithms, but that's mostly statistics and I presume you'd switch back to fixed point before making trades etc.
My best guess for the latter proposition is that people are reacting to the default float printing logic of languages like Java, which display a float as the shortest base-10 number that would correctly round to that value, which extremely exaggerates the effect of being off by a few ULPs. By contrast, C-style printf specifies the number of decimal digits to round to, so all the numbers that are off by a few ULPs are still correct.
[1] I'm not entirely sure about the COBOL mainframe applications, given that COBOL itself predates binary floating-point. I know that modern COBOL does have some support for IEEE 754, but that tells me very little about what the applications running around in COBOL do with it.
It’s really more of a concern in accounting, when monetary amounts are concrete and represent real money movement between distinct parties. A ton of financial software systems (HFT, trading in general) deal with money in a more abstract way in most of their code, and the particular kinds of imprecision that FP introduces doesn’t result in bad business outcomes that outweigh its convenience and other benefits.
It's a trade-off between precision and predictability. Floating point provides the former. Scaled integers provide the latter.
If you're summing up the cost of items in a webshop, then you're in the domain of accounting. If the result appears to be off by a single cent because of a rounding subtlety, then you're in trouble, because even though no one should care about that single cent, it will give the appearance that you don't know what you're doing. Not to mention the trouble you could get in for computing taxes wrong.
If, on the other hand, you're doing financial forecasting or computing stock price targets, then you're not in the domain of accounting, and using floating point for money is just fine.
I'm guessing from your post that your finance people are more like the latter. I could be wrong though - accountants do tend to use Excel.
feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
will cause your code to get a SIGFPE whenever a NaN crawls out from under a rock. Of course it doesn't work with fast-math enabled, but if you're unknowingly getting NaNs without fast-math enabled, you obviously need to fix those before even trying fast-math, and they can be hard to find, and feenableexcept() makes finding them a lot easier.Be very careful with it in production code though [1]. If you're in a dll then changing the FPU exception flags is a big no-no (unless you're really really careful to restore them when your code goes out of scope).
[1]: https://randomascii.wordpress.com/2016/09/16/everything-old-...
Could be wrong but that’s my gut feeling.
Make it work. Make it right. Make it fast.
Vivaldi 7.4.3691.52
Android 15; ASUS_AI2302 Build/AQ3A.240812.002
If it’s not always correct, whoever chooses to use it chooses to allow error…
Sounds worse than worthless to me.
Stop trying. Let their story unfold. Let the pain commence.
Wait 30 years and see them being frustrated trying to tell the next generation.
I'm surprised by the take that FTZ is worse than reassociation. FTZ being environmental rather than per instruction is certainly unfortunate, but that's true of rounding modes generally in x86. And I would argue that most programs are unprepared to handle subnormals anyway.
By contrast, reassociation definitely allows more optimization, but it also prohibits you from specifying the order precisely:
> Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result.
I haven't followed standards work in forever, but I imagine that the introduction of std::fma, gets people most of the benefit. That combined with something akin to volatile (if it actually worked) would probably be good enough for most people. Known, numerically sensitive code paths would be carefully written, while the rest of the code base can effectively be "meh, don't care".
“The problem is how FTZ actually implemented on most hardware: it is not set per-instruction, but instead controlled by the floating point environment: more specifically, it is controlled by the floating point control register, which on most systems is set at the thread level: enabling FTZ will affect all other operations in the same thread.
“GCC with -funsafe-math-optimizations enables FTZ (and its close relation, denormals-are-zero, or DAZ), even when building shared libraries. That means simply loading a shared library can change the results in completely unrelated code, which is a fun debugging experience.”
Sophira•1d ago
anthk•1d ago
https://www.forth.com/starting-forth/5-fixed-point-arithmeti...
With 32 and 64 bit numbers, you can just scale decimals up. So, Torvalds was right. On dangerous contexts (uper-precise medical doses, FP has good reasons to exist, and I am not completely sure).
Also, both Forth and Lisp internally suggest to use represented rationals before floating point numbers. Even toy lisps from https://t3x.org have rationals too. In Scheme, you have both exact->inexact and inexact->exact which convert rationals to FP and viceversa.
If you have a Linux/BSD distro, you may already have Guile installed as a dependency.
Hence, run it and then:
Thus, in Forth, I have a good set of q{+,-,*,/} operations for rational (custom coded, literal four lines) and they work great for a good 99% of the cases.As for irrational numbers, NASA used up 16 decimals, and the old 113/355 can be precise enough for a 99,99 of the pieces built in Earth. Maybe not for astronomical distances, but hey...
In Scheme:
In Forth, you would just use with a great precision for most of the objects being measured against.eqvinox•1d ago
stassats•1d ago
Or write your own operations that compute to the precision you want.
anthk•1d ago
s9 Scheme fails on this as it's an irrational number, but the rest of Schemes such as STKlos, Guile, Mit Scheme, will do it right.
With Forth (and even EForth if the images it's compiled with FP support), you are on your own to check (or rewrite) an fsqrt function with an arbitrary precision.
Also, on trig, your parent commenter should check what CORDIC was.
https://en.wikipedia.org/wiki/CORDIC
anthk•1d ago
https://en.wikipedia.org/wiki/CORDIC
Also, on sqrt functions, even a FP-enabled toy EForth under the Subleq VM (just as a toy, again, but it works) provides some sort of fsqrt functions:
Under PFE Forth, something 'bigger': EForth's FP precision it's tiny but good enough for very small microcontrollers. But it wasn't so far from the exponents the 80's engineers worked to create properly usable machinery/hardware and even software.dreamcompiler•1d ago
AlotOfReading•1d ago
anthk•1d ago