(-0.0) + (-0.0)
Does someone know any other case in IEEE 754?Bonus question: What happens in subtractions? I only know
(-0.0) - (+0.0)
Is there any other case? (-0.0) + (-0.0)
Does someone know any other case in IEEE 754?Bonus question: What happens in subtractions? I only know
(-0.0) - (+0.0)
Is there any other case?
sparkie•6mo ago
Here's an example of -1.0f + 1.0f resulting in -0.0: https://godbolt.org/z/5qvqsdh9P
gus_massa•6mo ago
---
FYI: For more context, I'm trying to send a PR to Chez Scheme (and indirectly to Racket) https://github.com/cisco/ChezScheme/pull/959 to reduce expressions like
where the "fixnums" are small integers and "flonums" are double.It's fine, unless you have the case
because if the length is 0, it get's transformed into 0.0 instead of -0.0There are a few corner cases, in particular because it's possible to have
and I really want to avoid the runtime check of (length L) == 0 if possible.So I took a look, asked there, and now your opinion confirms what I got so far. My C is not very good, so it's nice to have a example of how the rounding directions are used. Luckily Chez Scheme only uses the default rounding and it's probably correct to cut a few corners. I'll take a looks for a few days in case there is some surprise.
sparkie•6mo ago
An AVX-512 extension has a `vfixupimm` instruction[1] which can adjust special floating point values. You could use this to adjust all zeroes to -0 but leave any non-zeroes untouched. It isn't very obvious how to use though.
You want to set the nybble for categorization ZERO (bits 11..8) to 0x7 (-0) in `fixup`. This would mean you want `fixup` to be equal to `0x00000700`. So usage would be: Which compiles to just 4 instructions, with no branches: It can be extended to operate on 8 int64->double at a time (__m512d) with little extra cost.You could maybe use this optimization where the instruction is available and just stick with a branch version otherwise, or figure out some other way to make it branchless - though I can't think of any other way which would be any faster than a branch.
[1]:https://www.intel.com/content/www/us/en/docs/intrinsics-guid...
gus_massa•6mo ago
---
Thanks again, it's very interesting. I used assembler a long time ago, for the Z80 and 80?86, when the coprosesor was like 2 inches away :) . The problem is that Chez Scheme emits it's own assembler, and support many platforms. So after going into the rabbit hole, you get to asm-fpt https://github.com/search?q=repo%3Acisco%2FChezScheme+asm-fp... (expand and look for "define asm-fpt" near line 1300-2000)
This is like 2 or 3 layers below the level I usually modify, so I'm not sure about the details and quirks in that layer. I'll link to this discussion in github in case some of the maintainers wants to add something like this. My particular case is a very small corner cases and I'm not sure they'd like to add more complexity, but it's nice to have this info in case there are similar cases because once you notice them, they star to appear everywhere.
You can tag yourself in case someone wants to ask more questions or just get updates, but I expect that I'll go in the oposite direction.
sparkie•6mo ago
Yeah, you got it. See test: https://ce.billsun.dev/#g:!((g:!((g:!((h:codeEditor,i:(filen...
If you are going to implement something like this you basically need a fallback for where it is not supported. In C you write:
The optimized code will only be emitted if -mavx512f is passed to the compiler. This flag is implied if `-march=native` and the host compiling the code supports it, or if `-march=specificarch` and specificarch supports it. Otherwise the fallback code will be used.If using the custom assembler you would need to test whether AVX512F is available by using the CPUID instruction.