It would be nice to know what these great improvements actually are.
Improved experimental support for C++23, including:
std and std.compat modules (also supported for C++20).
From https://developers.redhat.com/articles/2025/04/24/new-c-feat...: The next major version of the GNU Compiler Collection (GCC), 15.1, is expected to be released in April or May 2025.
GCC 15 greatly improved the modules code. For instance, module std is now supported (even in C++20 mode).
> C: #embed preprocessing directive support.
> C++: P1967R14, #embed (PR119065)
See also:
https://news.ycombinator.com/item?id=32201951 - Embed is in C23 (2022-07-23)
Calavar•8h ago
This is going to silently break so much existing code, especially union based type punning in C code. {0} used to guarantee full zeroing and {} did not, and step by step we've flipped the situation to the reverse. The only sensible thing, in terms of not breaking old code, would be to have both {0} and {} zero initialize the whole union.
I'm sure this change was discussed in depth on the mailing list, but it's absolutely mind boggling to me
VyseofArcadia•8h ago
I can deal with the footguns if they aren't cheekily mutating over the years. I feel like in C++ especially we barely have the time to come to terms with the unintended consequences of the previous language revision before the next one drops a whole new load of them on us.
ryao•8h ago
fuhsnn•8h ago
https://lore.kernel.org/linux-toolchains/Z0hRrrNU3Q+ro2T7@tu...
matheusmoreira•7h ago
https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
seritools•8h ago
https://en.cppreference.com/w/c/language/union
> When initializing a union, the initializer list must have only one member, which initializes the first member of the union unless a designated initializer is used(since C99).
https://en.cppreference.com/w/c/language/struct_initializati...
→ = {0} initializes the first union variant, and bytes outside of that first variant are unspecified. Seems like GCC 15.1 follows the 26 year old standard correctly. (not sure how much has changed from C89 here)
hulitu•8h ago
The release cycle of a software speaks a lot about its quality. Move fast, break things has become the new development process.
pjmlp•7h ago
Maybe C should have stop at K&R C from UNIX V6, at least that would have spared the world in having it being adopted outside UNIX.
rgoulter•7h ago
ryao•6h ago
pjmlp•5h ago
When faced with writing a distributed systems application at Bell Labs, and having to deal with C, the very first step was to create C with Classes.
Also had C++ not been invented, or C gone into an history footnote, so what, there would be other programming languages to chose from.
Lets not put programming languages into some kind of worshiping sanctuary.
_joel•7h ago
Ragnarork•7h ago
Thank goodness this is not how the software world works overall. I'm not sure you understand the implications of what you ask for.
> if they aren't cheekily mutating over the years
You're complaining about languages mutating, then mention C++ which has added stuff but maintained backwards compatibility over the course of many standards (aside from a few hiccups like auto_ptr, which was also short lived), with a high aversion to modifying existing stuff.
ryao•8h ago
How much code actually uses unions this way?
> especially union based type punning in C code
I have never done type punning via the GNU C compiler extension in a way that would break because of this. I always assign a value to it and then get out the value from a new type. Do you know of any code that does things differently to be affected by this?
Calavar•8h ago
EDIT: I initially mentioned type punning for arithmetic, but this compiler change wouldn't affect that
ryao•8h ago
Calavar•8h ago
ryao•8h ago
In order for this change to leave something uninitialized, you would need to have a member of the union after the first member that is longer than the first member. Code that does that and relies on {0} to zero the union seems incredibly rare to me.
ndiddy•8h ago
I see this change caused Mbed-TLS to start failing its test suite when compiled with GCC 15: https://github.com/Mbed-TLS/mbedtls/issues/9814 (kinda scary since it's a security library). Hopefully other projects with less rigorous test suites aren't using {0} in that way. The Github issue mentions that Clang tried a similar optimization a while ago and backed it out after user complaints, so maybe the same thing will happen with GCC.
ryao•8h ago
ogoffart•8h ago
The code was already broken. It was an undefined behavior.
That's a problem with C and it's undefined behavior minefields.
ryao•8h ago
mtklein•8h ago
I am basing this entirely on memory and the wikipedia article on type punning. I welcome extremely pedantic feedback.
ryao•8h ago
trealira•7h ago
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In past standards, it said "trap representation" rather than "non-value representation," but in none of them did it say that union type punning was undefined behavior. If you have a PDF of any standard or draft standard, just doing a search for "type punning" should direct you to this footnote quickly.
So I'm going to say that if the GCC developer explicitly said that union type punning was undefined behavior in C, then they were wrong, because that's not what the C standard says.
amboar•6h ago
> (11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
So it's a little more constrained in the ramifications, but the outcomes may still be surprising. It's a bit unfortunate that "UB" aliases to both "Undefined behavior" and "Unspecified behavior" given they have subtly different definitions.
From section 4 we have:
> A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.4.
ryao•6h ago
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
Feel free to start a discussion on the GCC mailing list.
trealira•6h ago
ryao•6h ago
https://news.ycombinator.com/item?id=43794268
Taking snippets of the C standard out of context of the whole seems to result in misunderstandings on this.
trealira•6h ago
Edit: no, it's still in the unspecified behavior annex, that's my mistake. It's still not undefined, though.
ryao•6h ago
That said, I am going to defer to the GCC developers on this since I do not have time to make sense of all versions of the C standard.
trealira•5h ago
jotux•7h ago
ryao•6h ago
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
jotux•6h ago
I just was citing the source of this for reference.
ryao•6h ago
uecker•6h ago
ryao•6h ago
That said, using “the code compiles in godbolt” as proof that it is not relying on what the standard specifies to be UB is fallacious.
uecker•3h ago
jotux•7h ago
jcranmer•6h ago
In C89, it was implementation-defined. In C99, it was made expressly legal, but it was erroneously included in the list of undefined behavior annex. From C11 on, the annex was fixed.
> but UB in C++
C++11 adopted "unrestricted unions", which added a concept of active members that is UB to access other members unless you make them active. Except active members rely on constructors and destructors, which primitive types don't have, so the standard isn't particularly clear on what happens here. The current consensus is that it's UB.
C++20 added std::bit_cast which is a much safer interface to type punning than unions.
> punning through incompatible pointer casting was UB in both
There is a general rule that accessing an object through an 'incompatible' lvalue is illegal in both languages. In general, changing the const or volatile qualifier on the object is legal, as is reading via a different signed or unsigned variant, and char pointers can read anything.
trealira•6h ago
In C99, union type punning was put under Annex J.1, which is unspecified behavior, not undefined behavior. Unspecified behavior is basically implementation-defined behavior, except that the implementor is not required to document the behavior.
ryao•6h ago
trealira•5h ago
hermitdev•3h ago
You can, but in the context of the standard, you'd be wrong to do so. Undefined behavior and unspecified behavior have specific, different, meanings in context of the C and C++ standards.
Conflate them at your own peril.
ryao•6h ago
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
mat_epice•7h ago
- - -
Undefined behavior only means that the spec leaves a particular situation undefined and that the compiler implementor can do whatever they want. Every compiler defines undefined behavior, whether it’s documented (or easy to qualify, or deterministic) or not.
It is in poor taste that gcc has had widely used, documented behaviors that are changing, especially in a point release.
fsmv•6h ago
In a lot of cases in optimizing compilers they just assume UB doesn't exist. Yes technically the compiler does do something but there's still a big difference between the two.
mat_epice•6h ago
flohofwoe•6h ago
Union type punning is entirely valid in C, but UB in C++ (one of the surprisingly many subtle but still fundamental differences between C and C++). There's specifically a (somewhat obscure) footnote about this in the C standard, which also has been more clarified in one of the recent C standards.
ryao•6h ago
jcranmer•6h ago
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
(though this footnote has been present as far back as C99, albeit with different numbers as the standard has added more text in the intervening 24 years).
ryao•6h ago
> Type punning via unions is undefined behavior in both c and c++.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#c13
flohofwoe•6h ago
ryao•6h ago
trealira•5h ago
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (106), and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.
106) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This might be a non-value representation.
In that same document, union type punning is explicitly listed under Annex J.1, Unspecified Behavior:
(11) The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).
The standard is extremely clear and explicit that it's not undefined behavior.
ryao•5h ago
trealira•5h ago
jcranmer•5h ago
I don't know who Andrew Pinski is, but they're factually incorrect regarding the legality of type punning via unions in C.
uecker•2h ago
grandempire•7h ago
GCC probably has a better justification than “we are allowed to”.
arp242•6h ago
Maybe, but I've seen GCC people justify such changes with little more than "it's UB, we can change it, end of story", so I wouldn't assume it.
mwkaufma•4h ago
mistrial9•8h ago
grandempire•7h ago
And from a runtime perspective it’s going to be a struct with perhaps more padding. You’ll need more details about your specific threat model to explain why that’s bad.
mistrial9•7h ago
grandempire•7h ago
LowLevelMahn•7h ago
grandempire•7h ago
Still waiting to hear the security concerns.
LegionMammal978•6h ago
[0] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3...
[1] https://github.com/llvm/llvm-project/blob/llvmorg-20.1.3/lib...
jlouis•6h ago
soraminazuki•4h ago
mtklein•8h ago
mtklein•8h ago
dzaima•6h ago
myrmidon•7h ago
Zero-initialized-by-default for everything would be an extremely beneficial tradeoff IMO.
Maybe with a __noinit attribute or somesuch for the few cases where you don't need a variable to be initialized AND the compiler is too stupid to optimize the zero-initialization away on its own.
This would not even break existing code, just lead to a few easily fixed performance regressions, but it would make it significantly harder to introduce undefined and difficult to spot behavior by accident (because very often code assumes zero-initialization and gets it purely by chance, and this is also most likely to happen in the edge cases that might not be covered by tests under memory sanitizer if you even have those).
elromulous•6h ago
sidkshatriya•6h ago
You might claim that that you can have both but bugs are more inevitable in the uninitialised by default scenario. I doubt that variable initialisation is the thing that would slow down HFT. I would posit is it things like network latency that would dominate.
hermitdev•4h ago
As someone who works in the HFT space: it depends. How frequently and how bad are the bad-trade cases? Some slop happens. We make trade decisions with hardware _without even seeing an entire packet coming in on the network_. Mistakes/bad trades happen. Sometimes it results in trades that don't go our way or missed opportunities.
Just as important as "can we do better?" is "should we do better?". Queue priority at the exchange matters. Shaving nanoseconds is how you get a competitive edge.
> I would posit is it things like network latency that would dominate.
Everything matters. Everything is measured.
edit to add: I'm not saying we write software that either has or relies upon unitialized values. I'm just saying in such a hypothetical, it's not a cut and dry "do the right thing (correct according to the language spec)" decision.
Imustaskforhelp•1h ago
Wait what????
Can you please educate me on high frequency trading... , like I don't understand what's the point of it & lets say one person has created a hft bot then why the need of other bot other than the fact of different trading strats and I don't think these are profitable / how they compare in the long run with the boglehead strategy??
hermitdev•27m ago
HFT firms are (almost) always willing to buy or sell at or near the current market price. HFT firms basically race each other for trade volume from "retail" traders (and sometimes each other). HFTs make money off the spread - the difference between the bid & offer - typically only a cent. You don't make a lot of money on any individual trade (and some trades are losers), but you make money on doing a lot of volume. If done properly, it doesn't matter which direction the market moves for an HFT, they'll make money either way as long as there's sufficient trading volume to be had.
But honestly, if you want to learn about HFT, best do some actual research on it - I'm not a great source as I'm just the guy that keeps the stuff up and running; I'm not too involved in the business side of things. There's a lot of negative press about HFTs, some positive.
myrmidon•6h ago
You probably would not even need it in a lot of instances because the compiler would elide lots of dead stores (zeroing) even without hinting.
pjmlp•6h ago
That is the usual fearmongering when security improvements are done to C and C++.
TuxSH•5h ago
Depends on the boundary. I can give a non-Linux, microkernel example (but that was/is shipped on dozens of millions of devices):
- prior to 11.0, Nintendo 3DS kernel SVC (syscall) implementations did not clear output parameters, leading to extremely trivial leaks. Unprivileged processes could retrieve kernel-mode stack addresses easily and making exploit code much easier to write, example here: https://github.com/TuxSH/universal-otherapp/blob/master/sour...
- Nintendo started clearing all temporary registers on the Switch kernel at some point (iirc x0-x7 and some more); on the 3DS they never did that, and you can leak kernel object addresses quite easily (iirc by reading r2), this made an entire class of use-after-free and arbwrite bugs easier to exploit (call SvcCreateSemaphore 3 times, get sema kernel object address, use one of the now-patched exploit that can cause a double-decref on the KSemaphore, call SvcWaitSynchronization, profit)
more generally:
- unclearead padding in structures + copy to user = infoleak
so one at least ought to be careful where crossing privilege boundaries
bjourne•6h ago
myrmidon•6h ago
If you have instances of zero-initialized structs where you set individual fields after the initialization, all modern compiler will elide the dead stores in the the typical cases already anyway, and data of relevant size that is supposed to stay uninitialized for long is rare and a bit of an anti-pattern in my opinion anyway.
modeless•6h ago
bjourne•5h ago
rwmj•6h ago
For malloc, you could use a custom allocator, or replace all the calls with calloc.
myrmidon•5h ago
The only problem with vendor extensions like this is that you can't really rely on it, so you're still kinda forced to keep all the (redundant) zero intialization; solving it at the language level is much nicer. Maybe with C2030...
bluGill•4h ago
mastax•6h ago
I imagine it would be very useful to be able to search through all the C/C++ source files for all the packages in the distro in a semantic manner, so that it understands typedefs and preprocessor macros etc. The search query for this change would be something like "find all union types whose first member is not its largest member, then find all lines of code where that type is initialized with `{0}`".
ryao•6h ago
mastax•6h ago
ryao•5h ago
It is possible in theory to write a compiler plugin to generate an error when code that does this is found and it would make it easy to find all of the instances in all packages by building with `make -k`, provided that the code is not hidden behind an unused package flag.
anon-3988•3h ago
nikic•55m ago
So now you have this matrix of behaviors: * Old GCC: Initializes whole union. * New GCC: Initializes first member only. * Old Clang: Initializes first member only. * New Clang: Initializes whole union.