In general, in the presence of hardware parallelism (ie. always since 2007 or so) the very real corner cases are much more involved than "what if there’s an interrupt" and thinking in terms of single-threaded concurrency is not very fruitful in the presence of memory orderings less strict than seq_cst. It’s not about what order things can happen in (because there isn’t an order), it’s principally about how writes are propagated from the cache of one core to that of another.
x86 processors have sort of lulled many programmers of concurrent code into a false sense of safety because almost everything is either unordered or sequentially consistent. But the other now-common architectures aren’t as forgiving.
Edit: Since the parent commenter added two more paragraphs after I posted my answer: I wasn't wondering about the pitfalls of sequentially consistent multi-threaded execution on various CPU architectures. It is a well-known fact that x86 adheres to a stronger Total Store Order (TSO) model, whereas POWER and ARM have weaker memory models and actually require memory barriers at the instruction level. Not just to prevent a compiler reordering.
The point of weak memory models is to formally define the set of all possible legal executions of a concurrent program. This gets very complicated in a hurry because of the need to accommodate 1) hardware properties such as cache coherence and 2) desired compiler optimization opportunities that will want to reason over what's possible / what's guaranteed.
In this case, there was a conflict between a behavior of Power processors and an older C++ standard that meant that a correct compiler would have to introduce extra synchronization to prevent the forbidden behavior, thus impacting performance. The solution was to weaken the memory model of the standard in a very small way.
The article walks us through how exactly the newer standard permits the funny unintuitive execution of the example program.
The exercise is academic, sure. A lot of hard academic research has gone into this field to get it right. But it has to be that precise because the problems it solves are that subtle.
Originally, I was commenting, that the purpose of the article was initially unclear to me, since the order of thread execution cannot be determined anyway.
I now understand that there was a corner case in the POWER and ARM architectures when mixing seq-cst and acquire-release operations on the same atomic variable. Thus, C++26 will be updated to allow more relaxed behavior in order to maintain performance.
https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p06...
I write a lot of C++, and that is not a simple program. Short, sure.
Simple can still be difficult to understand.
Simple doesn’t have to be easy.
The comments show the values each thread observed.
Why? Nothing in that code implies any synchronization between threads and force an ordering. thread_2 can fetch value of y before 1 writes to it which would set b to 0.You would need additional mechanisms (an extra atomic that you compare_exchange) to force order
edit: but I guess the comment means it is the thing author wants to observe
Now, the big question: is this execution even possible under the C++ memory model?
sure, use an extra atomic to synchronize threadsIn other words, the author considers this execution to be surprising under the C++ memory model, and then goes on to explain it.
What? That would make the situation worse. The execution has a weird unintuitive quirk where the actions of thread 3 seem to precede the actions of thread 1, which seem to precede the actions of thread 2, yet thread 2 observes an action of thread 3. Stronger synchronization would forbid such a thing.
The main question of the article is "Is the memory model _weak enough_ to permit the proposed execution?"
from the article: > [Note 8: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. — end note]
this is the Java happens-before, right? What's the non-strong happens-berfore in C++ then?
In Java, happens-before is composed essentially of the union of two relations: program order (i.e., the order imposed within a single thread by imperative programming model) and synchronizes-with (i.e., the cross-thread synchronization constructs). C++ started out doing the same. However, this is why it broke: in the presence of weak atomics, you can construct a sequence of atomic accesses and program order relations across multiple threads to suggest that something should have a happens-before relation that actually doesn't in the hardware memory model. To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.
Note that, for a user, all of this stuff generally doesn't matter. You can continue to think of happens-before as a basic program order unioned with a cross-thread synchronizes-with and your code will all work, you just end up with a weaker (fewer things allowed) version of synchronizes-with. The basic motto I use is, to have a value be written on thread A and read on thread B, A needs to write the value then do a release-store on some atomic, and B then needs to load-acquire on the same atomic and only then can it read the value.
It's:
relaxed, consume, acquire, release, acq_rel, seq_cst
Nicely described here: <https://en.cppreference.com/w/cpp/atomic/memory_order.html>Rather, there's a panoply of definitions like "inter-thread happens-before" and "synchronizes-with" and "happens-before", and those are the ones I don't follow closely. It gets even more confusing when you're reading academic papers on weak memory models.
Sequenced-before, Synchronizes with, Inter-thread happens-before, Simply happens-before, Happens-before, Strongly happens-before
..and more definitions.> Under this relaxed definition, we find that we cannot establish a total > order over all memory_order::seq_cst operations due to a cycle in the graph: > (2) -> (4) -> (6) -> (7) -> (2).
I don't understand why they say that "(2) -> (4)". (2) must certainly come before (3) in the context of threads 1 & 3, which both refer to X and Y. But thread 2 knows nothing about X, so AFAIK accesses to X can be arbitrarily reordered from the perspective of thread 2 - if "reordering" can even be said to have any meaning at all in a thread that knows nothing about X.
mlvljr•3d ago