C++: Strongly Happens Before?

https://nekrozqliphort.github.io/posts/happens-b4/

93•signa11•6d ago

Comments

mlvljr•3d ago

Can't wait for the 25 yo seniors to make this the crown jewel of their interviews

thw_9a83c•3d ago

This seems like an overly academic exercise. Can the compiler, or even the operating system, guarantee that the threads `a`, `b`, and `c` are started in that order? I don't think so. The OS might start executing thread `a` on one CPU core and then be interrupted by a high-priority interrupt before it can do anything useful. By that time, threads `b` and `c` might already be running on other cores and have finished executing before thread `a`.

Sharlin•3d ago

Sure, that’s an entirely valid execution. But this is about what exact pairs of values (x, y) are observable by each thread. Some are allowed, others are not, by the semantics of atomic loads and stores guaranteed by the CPU. The starting or joining order of the threads doesn’t matter, except insofar that thread starting and joining both synchronize-with the parent thread.

In general, in the presence of hardware parallelism (ie. always since 2007 or so) the very real corner cases are much more involved than "what if there’s an interrupt" and thinking in terms of single-threaded concurrency is not very fruitful in the presence of memory orderings less strict than seq_cst. It’s not about what order things can happen in (because there isn’t an order), it’s principally about how writes are propagated from the cache of one core to that of another.

x86 processors have sort of lulled many programmers of concurrent code into a false sense of safety because almost everything is either unordered or sequentially consistent. But the other now-common architectures aren’t as forgiving.

thw_9a83c•3d ago

Thanks! So now the article actually makes sense to me. It would be nice to have this important clarification in the article itself. I'm not saying that a careful reader can't infer this point from the article even now, but I'm not such a careful reader.

Edit: Since the parent commenter added two more paragraphs after I posted my answer: I wasn't wondering about the pitfalls of sequentially consistent multi-threaded execution on various CPU architectures. It is a well-known fact that x86 adheres to a stronger Total Store Order (TSO) model, whereas POWER and ARM have weaker memory models and actually require memory barriers at the instruction level. Not just to prevent a compiler reordering.

cvoss•2d ago

It doesn't it matter for this article whether there exist possible executions other than the one the author inquires about.

The point of weak memory models is to formally define the set of all possible legal executions of a concurrent program. This gets very complicated in a hurry because of the need to accommodate 1) hardware properties such as cache coherence and 2) desired compiler optimization opportunities that will want to reason over what's possible / what's guaranteed.

In this case, there was a conflict between a behavior of Power processors and an older C++ standard that meant that a correct compiler would have to introduce extra synchronization to prevent the forbidden behavior, thus impacting performance. The solution was to weaken the memory model of the standard in a very small way.

The article walks us through how exactly the newer standard permits the funny unintuitive execution of the example program.

The exercise is academic, sure. A lot of hard academic research has gone into this field to get it right. But it has to be that precise because the problems it solves are that subtle.

thw_9a83c•2d ago

Yes, see: https://news.ycombinator.com/item?id=45091610

Originally, I was commenting, that the purpose of the article was initially unclear to me, since the order of thread execution cannot be determined anyway.

I now understand that there was a corner case in the POWER and ARM architectures when mixing seq-cst and acquire-release operations on the same atomic variable. Thus, C++26 will be updated to allow more relaxed behavior in order to maintain performance.

https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p06...

pixelpoet•3d ago

> A simple program to start

I write a lot of C++, and that is not a simple program. Short, sure.

stingraycharles•3d ago

I mean, dining philosophers is very simple as well. Dijkstra’s shortest path is simple.

Simple can still be difficult to understand.

taneq•2d ago

It sounds like your definition of “simple” is more like “short” than “straightforward”?

stingraycharles•2d ago

Simple as in the opposite of complex.

Simple doesn’t have to be easy.

taneq•1d ago

If your definition of 'simple' varies that drastically from mine, it might follow that your definition of 'complex' varies likewise. I guess we just use these words in different ways.

shultays•3d ago

  The comments show the values each thread observed.

Why? Nothing in that code implies any synchronization between threads and force an ordering. thread_2 can fetch value of y before 1 writes to it which would set b to 0.

You would need additional mechanisms (an extra atomic that you compare_exchange) to force order

edit: but I guess the comment means it is the thing author wants to observe

  Now, the big question: is this execution even possible under the C++ memory model?

sure, use an extra atomic to synchronize threads

masfuerte•2d ago

The comments show the actual values observed in one particular execution. The author asks if this is compatible with the C++ memory model.

In other words, the author considers this execution to be surprising under the C++ memory model, and then goes on to explain it.

cvoss•2d ago

> sure, use an extra atomic to synchronize threads

What? That would make the situation worse. The execution has a weird unintuitive quirk where the actions of thread 3 seem to precede the actions of thread 1, which seem to precede the actions of thread 2, yet thread 2 observes an action of thread 3. Stronger synchronization would forbid such a thing.

The main question of the article is "Is the memory model _weak enough_ to permit the proposed execution?"

shelajev•2d ago

my background is mostly Java so I know this happens-before: (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.htm...).

from the article: > [Note 8: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. — end note]

this is the Java happens-before, right? What's the non-strong happens-berfore in C++ then?

jcranmer•2d ago

The data-race-free memory model was an observation back in the early 90's that a correctly-synchronized program that has no data races will, even on a weak memory model multiprocessor, be indistinguishable from a fully sequentially-consistent memory model. This was adapted into the Java 5 memory model, with the happens-before relation becoming the definition of correctly-synchronized, and then C++11 explicitly borrowed that model and extended it to include weaker atomics, and pretty much everybody else borrows directly or indirectly from that C++ memory model. However, C++ had to go back and patch the definition because their original definition didn't work, and it took C++ standardizing a model to get the academic community to a state where we could finally formalize weak memory models.

In Java, happens-before is composed essentially of the union of two relations: program order (i.e., the order imposed within a single thread by imperative programming model) and synchronizes-with (i.e., the cross-thread synchronization constructs). C++ started out doing the same. However, this is why it broke: in the presence of weak atomics, you can construct a sequence of atomic accesses and program order relations across multiple threads to suggest that something should have a happens-before relation that actually doesn't in the hardware memory model. To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.

Note that, for a user, all of this stuff generally doesn't matter. You can continue to think of happens-before as a basic program order unioned with a cross-thread synchronizes-with and your code will all work, you just end up with a weaker (fewer things allowed) version of synchronizes-with. The basic motto I use is, to have a value be written on thread A and read on thread B, A needs to write the value then do a release-store on some atomic, and B then needs to load-acquire on the same atomic and only then can it read the value.

thw_9a83c•2d ago

> To describe the necessary relations, you need to add several more kinds of dependencies, and I'm not off-hand sure which dependencies ended up with which labels.

It's:

    relaxed, consume, acquire, release, acq_rel, seq_cst

Nicely described here: <https://en.cppreference.com/w/cpp/atomic/memory_order.html>

jcranmer•2d ago

No, that's not the thing I'm talking about. Those are the different ordering modes you can specify on atomic operations.

Rather, there's a panoply of definitions like "inter-thread happens-before" and "synchronizes-with" and "happens-before", and those are the ones I don't follow closely. It gets even more confusing when you're reading academic papers on weak memory models.

thw_9a83c•2d ago

Those are also described on that page:

    Sequenced-before, Synchronizes with, Inter-thread happens-before, Simply happens-before, Happens-before, Strongly happens-before

..and more definitions.

alextingle•2d ago

About half way down the article, the author says this:

> Under this relaxed definition, we find that we cannot establish a total > order over all memory_order::seq_cst operations due to a cycle in the graph: > (2) -> (4) -> (6) -> (7) -> (2).

I don't understand why they say that "(2) -> (4)". (2) must certainly come before (3) in the context of threads 1 & 3, which both refer to X and Y. But thread 2 knows nothing about X, so AFAIK accesses to X can be arbitrarily reordered from the perspective of thread 2 - if "reordering" can even be said to have any meaning at all in a thread that knows nothing about X.

30 minutes with a stranger

Le Chat. Custom MCP Connectors. Memories

Use Bayes rule to mechanically solve probability riddles

The Color of the Future: A history of blue

Polars Cloud and Distributed Polars now available

I Should Have Loved Electrical Engineering

Show HN: A roguelike game that runs inside Notepad++

Étoilé – desktop built on GNUStep

Claude Code: Now in Beta in Zed

Neovim Pack

Liquid Glass? That's what your M4 CPU is for

Reverse engineering Solos smart glasses

Minesweeper thermodynamics

The Bitter Lesson Is Misunderstood

AR Fluid Simulation Demo

Melvyn Bragg steps down from presenting In Our Time

Nuclear: Desktop music player focused on streaming from free sources

A Rebel Writer's First Revolt

Google was down in eastern EU and Turkey

William Wordsworth's letter: "The Law of Copyright" (1838)

Hledger 1.50

New knot theory discovery overturns long-held mathematical assumption

Half an year on Alpine: just musl aside

Writing a C compiler in 500 lines of Python (2023)

Understanding Transformers Using a Minimal Example

Eels are fish

What is it like to be a bat?

ReMarkable Paper Pro Move

Say Bye with JavaScript Beacon

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels

C++: Strongly Happens Before?

Comments

30 minutes with a stranger

Le Chat. Custom MCP Connectors. Memories

Use Bayes rule to mechanically solve probability riddles

The Color of the Future: A history of blue

Polars Cloud and Distributed Polars now available

I Should Have Loved Electrical Engineering

Show HN: A roguelike game that runs inside Notepad++

Étoilé – desktop built on GNUStep

Claude Code: Now in Beta in Zed

Neovim Pack

Liquid Glass? That's what your M4 CPU is for

Reverse engineering Solos smart glasses

Minesweeper thermodynamics

The Bitter Lesson Is Misunderstood

AR Fluid Simulation Demo

Melvyn Bragg steps down from presenting In Our Time

Nuclear: Desktop music player focused on streaming from free sources

A Rebel Writer's First Revolt

Google was down in eastern EU and Turkey

William Wordsworth's letter: "The Law of Copyright" (1838)

Hledger 1.50

New knot theory discovery overturns long-held mathematical assumption

Half an year on Alpine: just musl aside

Writing a C compiler in 500 lines of Python (2023)

Understanding Transformers Using a Minimal Example

Eels are fish

What is it like to be a bat?

ReMarkable Paper Pro Move

Say Bye with JavaScript Beacon

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels