> On ARM, such atomic load incurs a memory barrier---a fairly expensive operation.
Not quite, it is just a load-acquire, which is almost as cheap as a normal load. And on x86 there's no difference.
One thing where both GCC and Clang seem to be quite bad at is code layout: even in the example in the article, the slow path is largely inlined. It would be much better to have just a load, a compare, and a jump to the slow path in a cold section. In my experience, in some rare cases reimplementing the lazy initialization explicitly (especially when it's possible to use a sentinel value, thus doing a single load for both value and guard) did produce a noticeable win.
Why not just use constinit (iff applicable), construct_at, or lessen the cost with -fno-threadsafe-statics?
STOP WRITING NON-PORTABLE CODE YOU BASTARDS.
The correct answer is, as always, “stop using mutable global variables you bastard”.
Signed: someone who is endlessly annoyed with people who incorrectly think Unix is the only platform their code will run on. Write standard C/C++ that doesn’t rely on obscure tricks. Your co-workers will hate you less.
So I spin up a Debian VM and POSIX the hell out of it. If they dare to complain, I tell 'em to do their damn jobs and not leave all the hard stuff to the guy that only programs on UNIX.
Note that as I later found out, this doesn't work with Mac OS's linker, so you need to use some separate incantations for Mac OS.
I call them "linker arrays". They are great when you need to globally register a set of things and the order between them isn't significant.
https://github.com/abseil/abseil-cpp/blob/master/absl/base/n...
Which is basically the only usage of std::launder I have seen
The use of std::launder should be more common than it is, I’ve seen a few bugs in optimized builds when not used, but compilers have been somewhat forgiving about not using it in places you should because it hasn’t always existed. Rigorous code should be using it instead of relying on the leniency of the compiler.
In database engine code it definitely gets used in the storage layers.
#define FAST_STATIC(T) \
*({ \
\ // statements separated by semicolons
reinterpret_cast<T *>(ph.buf); \ // the value of the macro as a statement
})
The reinterpret_cast<T*>(...) statement is a conventional C++ Expression Statement, but when enclosed in ({}), GCC considers the whole kit and kaboodle a Statement Expression that propagates a value.There is no C equivalent, but in in C++, since C++11 you can achieve the same effect with lambdas:
auto value = [](){ return 12345; }();
As noted in the linked SO discussion, this is analogous to a JS Immediately-Invoked Function Expression (IIFE).[1] https://stackoverflow.com/questions/76890861/what-is-called-...
pbsd•6h ago
No. The lock calls are only done during initialization, in case two threads run the initialization concurrently while the guard variable is 0. Once the variable is initialized, this will always be skipped by "je .L3".