union {
int i;
float f;
} *u;
float f = 3.14;
u = &f;
x = u->i;
> In this case the memory pointed to by “u” has the declared effective type of int, and given that “u” is a union that contains int, the access using the “i” member is legal. It’s noteworthy in this that the “f” member of the union is never used, but only there to satisfy the requirement of having a member with a type compatible with the effective type.Is this a typo? Should it say "declared effective type of float" and "“u” is a union that contains float"?
It's interesting to see type-punning using a union - I've read that it should be avoided and to use `memcpy` instead. Are there any issues with the union approach in C? Or is the advice to prefer `memcpy` specific to C++, where AFAICT the union approach is undefined behaviour?
The other day we had standard committee members confirming union punning is good in C: https://news.ycombinator.com/item?id=43793225
https://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p5
> Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...] Such a representation is called a trap representation.
https://port70.net/~nsz/c/c11/n1570.html#6.5.2.3p3
> A postfix expression followed by the `.` operator and an identifier designates a member of a structure or union object. The value is that of the named member. [Footnote: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.]
I'm fuzzy on exactly what a "trap representation" might be in real life. I have the impression that a signaling NaN isn't. I suspect that a visibly invalid pointer value on a CHERI-like or ARM64e-like platform might be. Anyway, my impression is that sane platforms don't have trap representations, so indeed, you have to go out of your way to contrive a situation where C's paper standard would not define type-punning (whether union-based or pointer-cast-based) to have the "common-sense" physical behavior.
Again this is different from C++, where both union-based type-punning and pointer-cast-based type-punning have UB, full stop:
https://eel.is/c++draft/expr.prop#basic.lval-11
> An object of dynamic type Tobj is _type-accessible_ through a glvalue of type Tref if Tref is similar to Tobj, a type that is the signed or unsigned type corresponding to Tobj, or a char, unsigned char, or `std::byte` type.
> If a program attempts to access the stored value of an object through a glvalue through which it is not type-accessible, the behavior is undefined.
The point of the type system is to define types. It’s not to make the compiler’s job easier, or to give standards committees clouds to build their castles on. No amount of words will justify this misbegotten misinvention.
As I read it, this means that
struct foo *x = malloc(sizeof(*x))
Will have an effective type of "struct foo*", which seems like what you would expect.
cancerhacker•9mo ago
It seems like a lost art to think that way. It’s disturbing to me how many candidates couldn’t write Hello World and compile it from the command line.
Everyone should spend some time with godbolt.org or better, the -save-temps compiler flag, to see how changes affect your generated code. Right now. I’ll wait. (Shakes cane at kids)
anyfoo•9mo ago
But it's rough, and dangerous. Optimizers do a lot these days, and I really mean a lot. Besides completely mangling your program order, which includes shoving entire blocks of code into places that you might not have guessed, they also do such things as leveraging undefined behavior for optimizations (what the article is partly about), or replacing entire bits of code by function calls. (A compiler might make code out of your memcpy(), and vice versa; the latter can be especially surprising.)
If you care about the assembly representation of your C code (which kernel developers often do), you will spend a lot of time with the "volatile" keyword, compiler barriers, and some obscure "__attribute__"s.
But I agree, even with those caveats in mind, it's a very useful skill to imagine your C code as what it translates to (even if that representation is just a simplified model of what the compiler will actually do).
fsckboy•9mo ago
that is a poor way to handle UB as it introduces bugs (which are UB themselves). If a compiler detects UB, it should flag an error so the source code gets changed. compilers (or any software really) should never be maliciously compliant.
anyfoo•9mo ago
If compilers did not take advantage of this, then a lot of behavior would not have to be undefined in the first place. Undefined behavior isn't conjured up from a magical place, it was deliberately specified for a reason.
The subject of the linked article, strict aliasing, is a prime example of exactly that: Surprisingly strict rules for aliasing, giving compilers the opportunity to better optimize code that follows these rules, at the risk of breaking code that does not follow the rules in arbitrary and perhaps unintuitive ways.
Now, these particular rules are controversial, and the article acknowledges this:
Nevertheless, there are many other rules that are much more readily accepted where similar things are taking place.fsckboy•9mo ago
that's pure maliciousness. if the programmer has written code that exhibits undefined behavior, it should be flagged as an error so it can be changed to code that does not exhibit undefined behavior.
programs need to have one unambiguous meaning, and it should be the meaning intended by the programmer. if meanings can be detected as ambiguous or as not what the programmer intended, that should be flagged, not magically swept under the carpet because it's "faster".
fsckboy•9mo ago
anyfoo•9mo ago
anyfoo•9mo ago
lalaithion•9mo ago
unwind•9mo ago
C is specified against an abstract (not virtual) machine, and it matters.
All the talk about how undefined behaviors give the compiler right to shuffle and/or remove code really break the analogy with assembler, where most things become Exactly What You Say.