Doesn't get much glibber than that!
There is now atomic_shared_ptr which is thread safe.
In this particular case, it was subtly, Rust is preferred because it doesn't allow unsafe memory operations such as the one demonstrated. Really, all it demonstrates is that you can create really bad C++.
[0] https://chromium.googlesource.com/chromium/src/+/HEAD/base/m...
mwkaufma•4mo ago
Why intentionally design a worse alternative to std::weak_ptr which has been around since C++11??
TinkersW•4mo ago
niekb•4mo ago
There are alternative ways to utilize a machine with multiple cores, e.g. by running one thread per CPU core, and not sharing state between those threads; in each such thread you then have single-thread "semantics".
mwkaufma•4mo ago
vlovich123•4mo ago
mwkaufma•4mo ago
vlovich123•4mo ago
> Note that the control block used by std::weak_ptr and std::shared_ptr is thread-safe: different non-atomic std::weak_ptr objects can be accessed using mutable operations, such as operator= or reset, simultaneously by multiple threads, even when these instances are copies or otherwise share the same control block internally. The type T may be an incomplete type.
There’s no variant of shared_ptr / weak_ptr that is non atomic in the standard library AFAIK.
spacechild1•4mo ago
niekb•4mo ago
One reason why this design can still be beneficial when using the standard std::shared_ptr in its implementation, is when you do not want to manage the pointee object by a std::shared_ptr (which is a requirement if you want to use std::weak_ptr). E.g., if you want to ensure that multiple objects of that type are laid out next to each other in memory, instead of scattered around the heap.
Another goal of the post is to show this idea, namely to use a shared_ptr<T*> (instead of shared_ptr<T>), which is kind of non-standard, but can be (as I hope I convinced you) sometimes useful.
cherryteastain•4mo ago
Why would there be contention in a single threaded program?
izabera•4mo ago
eMSF•4mo ago
menaerus•4mo ago
aw1621107•4mo ago
[0]: https://snf.github.io/2019/02/13/shared-ptr-optimization/
menaerus•4mo ago
aw1621107•4mo ago
I think the distinction between the two should be visible in some cases - for example, what happens for shared libraries that use std::shared_ptr and don't link libpthread, but are later used with a binary that does link libpthread?
menaerus•4mo ago
Implementation of __gthread_active_p is indeed a runtime check [3] which AFAICS applies only to single-threaded programs. Perhaps the shared-library use-case also fits here?
Strange optimization IMHO so I wonder what was the motivation behind it. The cost function being optimized in this case is depending on WORD being atomic [4] without actually using the atomics [5].
[0] https://codebrowser.dev/llvm/include/c++/11/bits/shared_ptr_...
[1] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....
[2] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....
[3] https://codebrowser.dev/kde/include/x86_64-linux-gnu/c++/11/...
[4] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....
[5] https://codebrowser.dev/llvm/include/c++/11/ext/atomicity.h....
aw1621107•4mo ago
The line you linked is for some FreeBSD/Solaris versions which appear to have some quirks with the way pthreads functions are exposed in their libc. I think the "normal" implementation of __gthread_active_p is on line 248 [0], and that is a pretty straightforwards check against a weak symbol.
> Strange optimization IMHO so I wonder what was the motivation behind it.
I believe the motivation is to avoid needing to pay the cost of atomics when there is no parallelism going on.
> The cost function being optimized in this case is depending on WORD being atomic [4] without actually using the atomics [5].
Not entirely sure what you're getting at here? The former is used for single-threaded programs so there's ostensibly no need for atomics, whereas the latter is used for non-single-threaded programs.
[0]: https://codebrowser.dev/kde/include/x86_64-linux-gnu/c++/11/...
menaerus•4mo ago
> I believe the motivation is to avoid needing to pay the cost of atomics when there is no parallelism going on.
Obviously yes. What I am wondering is what benefit does it bring in practice. Single-threaded program with shared-ptr's using atomics vs shared-ptr's using WORDs seem like a non-problem to me - e.g. I doubt it has a measurable performance impact. Atomics are slowing down the program only when it comes to contention, and single-threaded programs can't have them.
aw1621107•4mo ago
I mean, the blog post basically starts with an example where the performance impact is noticeable:
> I found that my Rust port of an immutable RB tree insertion was significantly slower than the C++ one.
And:
> I just referenced pthread_create in the program and the reference count became atomic again.
> Although uninteresting to the topic of the blog post, after the modifications, both programs performed very similarly in the benchmarks.
So in principle an insert-heavy workload for that data structure could see a noticeable performance impact.
> Atomics are slowing down the program only when it comes to contention, and single-threaded programs can't have them.
Not entirely sure I'd agree? My impression is that while uncontended atomics are not too expensive they aren't exactly free compared to the corresponding non-atomic instruction. For example, Agner Fog's instruction tables [0] states:
> Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices, then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand.
And there's this blog post [1], which compares the performance of various concurrency mechanisms/implementations including uncontended atomics and "plain" code and shows that uncontended atomics are still slower than non-atomic operations (~3.5x if I'm reading the raw data table correctly).
So if the atomic instruction is in a hot loop then I think it's quite plausible that it'll be noticeable.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
[1]: https://travisdowns.github.io/blog/2020/07/06/concurrency-co...
menaerus•3mo ago
grogers•4mo ago
izabera•4mo ago
loeg•4mo ago
> For our use case, we in fact do not use std::shared_ptr in our implementation, but instead a single-threaded shared_ptr-like class that has no atomics (to avoid cross-core contention).
A single-threaded program will not have cross-core contention whether it uses std::atomic<> refcounts or plain integer refcounts, period. You're right that non-atomic refcounts can be anywhere from somewhat cheaper to a lot cheaper than atomic refcounts, depending on that platform. But that is orthogonal to cross-core contention.
vlovich123•4mo ago
SkiFire13•4mo ago
Note that the difference is not so marginal, and the difference is not just in hardware instructions as the non-atomic operations generally allow for more optimizations by the compiler.
loeg•4mo ago
It's comparable to like, two integer multiplies, or a single integer division. Yes, there is some effect on program order.
mwkaufma•4mo ago
Moving goalpost. But just to follow that thought: Decoupling alloc+init via e.g. placement-new to do this introduces a host of complications not considered in your solution.
If that layout _is_ a requirement, and you don't want a totally nonstandard foundation lib with nonstandard types promiscuously necessitating more nonstandard types, you want a std::vector+index handle.
mabster•4mo ago
We were happy to use unique_ptr, however.
surajrmal•4mo ago
dataangel•4mo ago