Atomics and Concurrency

https://redixhumayun.github.io/systems/2024/01/03/atomics-and-concurrency.html

139•LAC-Tech•3d ago

Comments

ozgrakkurt•1d ago

This is great, also looking for information about performance impact of using atomics like atomically reference counting etc. if anyone can recommend something to learn from

cmrdporcupine•1d ago

https://travisdowns.github.io/blog/2020/07/06/concurrency-co...

j_seigh•1d ago

Performance from using atomics in reference counting is going to degrade quite a bit under heavy usage due to cache contention. If you are in that situation, you should be looking at something else like RCU or hazard Pointers.

tialaramex•1d ago

Note that these strategies (collectively, "delayed reclamation") lose one benefit of the RAII concept. With Rust's Drop or C++ destructors we are guaranteed that when we don't need this Goat the clean-up-unused-goat code we wrote will run now - unlike with a Garbage Collector where that code either runs at some unspecified future time or not at all.

Delayed reclamation means that "when we don't need this Goat" will sometimes be arbitrarily delayed after we apparently didn't need it, and it's no longer possible to say for sure by examining the program. This will almost always be a trade you're willing to make, but you need to know it exists, this is not a fun thing to discover while debugging.

zozbot234•1d ago

With RCU the "we don't need this Goat" occurs as soon as the last readers are done with the earlier version of Goat, modulo a limited grace period in some RCU implementations. New readers always get the latest version, so there is no risk of waiting for an "unspecified" or "arbitrary" amount of time.

Hazard pointers are a bit fiddlier AIUI, since the reclamation step must be triggered explicitly and verify that no hazard pointers exist to the object being reclaimed, so it is quite possible that you won't promptly reclaim the object.

tialaramex•23h ago

From the point of view of any particular updater, the death of every current reader (who would keep the Goat alive) is arbitrary. Obviously in a real system it won't literally be unbounded, but I don't think it will usually make sense to work out how long that could take - in terms of clock cycles, or wall clock time, or any external metric - it won't necessarily be immediate but it will happen "eventually" and so just write software accordingly.

Maybe I'm wrong for some reason and there is a practical limit in play for RCU but not Hazard Pointers, I don't think so though.

PaulDavisThe1st•22h ago

One of the central purposes of RCU is to decouple the updater/writer(s) from the reader(s). It doesn't matter to the writer if there is still a reader "out there" using the old version. And it likewise doesn't matter to (most) readers that the version they have is now old.

What is delayed is the actual destruction/deallocation of the RCU-managed objects. Nobody cares about that delay unless the objects control limited resources (in which case, RCU is likely a bad fit) or there are so many objects and/or they are so large that the delay in deallocating them could cause memory pressure of some type.

tialaramex•20h ago

Sure, that's exactly my understanding.

manwe150•16h ago

> we are guaranteed that when we don't need this Goat the clean-up-unused-goat code we wrote will run now

Not to put too fine a point on things, but rust (and c++) very explicitly don’t guarantee this. They are both quite explicit about being allowed to leak memory and never free it (due to reference cycles), something a GC is not typically allowed to do. So yes it usually happens, it just is not guaranteed.

Maxatar•14h ago

Your point that reference counting can result in memory leaks is absolutely correct and a worthwhile point. However, it's also worth pointing out that tracing garbage collectors like those used by Java are also allowed to leak memory and never free it. In the most extreme scenario you have the epsilon garbage collector, which leaks everything:

https://openjdk.org/jeps/318

Implementing a garbage collector that is guaranteed to free memory when it's actually no longer needed is equivalent to solving the halting problem (via Rice's theorem), and so any garbage collection algorithm is going to have to leak some memory, it's simply unavoidable.

tialaramex•13h ago

All you're getting to is that we're not obliged in Rust to ever decide we didn't need the Goat - but I didn't argue that we are.

The "finalizer problem" in the Garbage Collected languages isn't about a Goat which never becomes unused, it's about a situation where the Goat is unused but the clean-up never happens.

pythops•1d ago

For Rust, I highly recommend this book https://marabos.nl/atomics/

reitzensteinm•1d ago

This is such a great book, especially the section on operating system primitives, which made the book wider in scope and more practical. After all, you're probably not building exotic data structures by hand in memory without also needing high performance IO.

It's been a hobby of mine to collect concurrency examples from books and blog posts and simulating them in Temper, my Rust memory model simulator. As far as I know, it's the largest Rust/C++11 memory model test suite on the internet (but I'm happy to be corrected).

This is the file for Rust Atomics and Locks:

https://github.com/reitzensteinm/temper/blob/main/memlog/tes...

I didn't find any bugs in the examples, but with how good the book was, I didn't expect to :)

The Williams book for C++ contains many of the same ideas (Rust's memory model is a copy/paste from C++11 without the now deprecated Consume) and I can highly recommend that too.

pythops•1d ago

absolutely ! This book is useful for even non rust developers I think

pimeys•9h ago

Yes. This is how I learned the atomics and memory ordering. It's so much fun to read, and super interesting.

Highly recommend!

tialaramex•1d ago

> TSan is a reliable way of detecting data races in your code, instead of trying to repeatedly run the code in a loop hoping for a data race to occur.

I'm sure this doesn't intend to be misleading but I think it can mislead people anyway.

TSan should detect races but just like "repeatedly run the code in a loop" it's not actually checking what you wrote doesn't have races, it's just reporting any which happened during testing. This is valuable - if you eliminate a race you've almost certainly fixed a real bug, maybe a very subtle bug you'd have cursed for years otherwise, but you can't use TSan to prove there are no further bugs of this kind.

Tools to prove this stuff are often much harder to bring into a project, but you should put that work in if the difference between "It's probably fine" and "I proved it's correct" sounds necessary to you or for your work. I don't want my pacemaker to be "probably fine" but it's OK if the Youtube music player on my phone wasn't proved correct.

dataflow•1d ago

My (limited) understanding is that the way TSAN works is better than what you're comparing it to. You don't have to rerun TSAN in a loop to finally catch a race. As long as your test has coverage of that scenario (i.e. it executes the lines of interest once within a reasonable time window) it should get caught. Of course, being dynamic, it can't catch code that you never execute during testing, but that's not such a huge problem because you can verify code coverage through other means. It's quite a bit of a different from having to make stars align by looping until some e.g. rare timing issue comes up.

The real issue here isn't that TSAN is low-confidence about absence of data races in C++. The issue is that even if it statically proved the absence of data races in the C++ sense, that still wouldn't imply that your algorithm is race-free. Because it's trivial to patch every single instance of a data race by using an atomic to get TSAN to shut up, but that in no way implies the higher-level concurrent algorithm works correctly. (e.g., if two threads perform an atomic load of some variable, then add one, then perform an atomic store back in, they would obviously stomp on each other, but TSAN wouldn't care.) It's more like a UB checker than a correctness verifier.

withoutboats3•1d ago

The occurrence of data races depends on the specific non-deterministic sequence of execution of concurrent codepaths. Just because you have 100% code coverage does not mean you've covered every potential execution sequence, and its almost never practical to actually execute every possibility to ensure the absence of data races. Depending on the probability that your data race will occur, it could indeed be something you have to make stars align for TSAN to catch.

Not to talk my own book, but there is a well-known alternative to C++ that can actually guarantee the absence of data races.

dataflow•1d ago

It "could" for some algorithms, yes, but for a lot of algorithms, that kind of star alignment simply isn't necessary to find all the data races, was my point. And yes, TLA+ etc. can be helpful, but then you have the problem of matching them up with the code.

Maxatar•23h ago

I feel like in a subtle way you're mixing up data races with race conditions, especially given the example you site about incrementing an atomic variable.

TSAN does not check for race conditions in general, and doesn't claim to do so at all as the documentation doesn't include the term race condition anywhere. TSAN is strictly for checking data races and deadlocks.

Consequently this claim is false:

>The issue is that even if it statically proved the absence of data races in the C++ sense, that still wouldn't imply that your algorithm is race-free.

Race-free code means absence of data races, it does not mean absence of the more general race condition. If you search Google Scholar for race free programming you'll find no one uses the term race-free to refer to complete absence of race conditions but rather to the absence of data races.

dataflow•23h ago

There's "data race" in "C++ ISO standard" sense, and then there's "data race" in the general CS literature (as well as all the other terms). Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense. I'm not really interested in pedantic terminology here, just trying get a higher level point across about what you can & can't assume with a clean TSAN (and how not to clean your TSAN errors). Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.

kiitos•23h ago

> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense

You seem to conflate the concepts of "data race" and "race condition", which are not the same thing.

Two threads writing to the same memory location without synchronization (without using atomic operations, without going thru a synchronization point like a mutex, etc.) is a data race, and almost certainly also a race condition. If access to that memory location is synchronized, whether thru atomics or otherwise, then there's no data race, but there can still be a race condition.

This isn't a pedantic distinction, it's actually pretty important.

Maxatar•23h ago

This isn't pedantry, if you're going to talk about how specific tools work then you need to use the actual terminology that the tools themselves use or else you will confuse yourself and anyone you talk to about them. If we were discussing general concepts about thread safety then sure we can be loose about our words, but if we're talking about a specific tool used for a specific programming language then we should make sure we are using the correct terminology, if only to signal that we have the proper domain knowledge to speak about the subject.

>Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.

If I rewrite your comment to use data race, then your comment is plainly incorrect since the supporting example you give is not a data race but a race condition.

If I rewrite your comment to use race condition, then your comment is also incorrect since TSAN doesn't detect race conditions in general and doesn't claim to, it detects data races.

So what exactly am I supposed to do in order to make sense of your post?

The idea that you'd talk about the pros and cons of a tool like TSAN without knowing the difference between a race condition and a data race is kind of absurd. That you'd claim my clarification of these terms for the sake of better understanding your point is a form of pedantry is sheer hubris.

dataflow•22h ago

Hold on, before attacking me. Say we have this Java program, and assume the semantics of the common JRE/JVM everyone uses. Do you believe it has a data race or not? Because the variable is accessed atomically, whether whether you mark it as volatile or not:

  class Main {
    private static String s = "uninitialized";
    public static void main(String[] args) {
      Thread t = new Thread() {
        public void run() { s = args[0]; }
      };
      t.start();
      System.out.println(s);
    }
  }

And I sure as heck have not heard anyone claim such data races are impossible in Java. (Have you?)

tialaramex•20h ago

What you wrote is indeed a data race, it'll race s, but you mention the semantics of the JRE and I wonder if you actually know what those are because that's crucial here.

You see Java has a specific memory ordering model (many languages just give you a big shrug, including C before it adopted the C++ 11 model but Java spells out what happens) and that model is very sophisticated so it has an answer to what happens here.

Because we raced s, we lose Sequential Consistency. This means in general (this example is so trivial it won't matter) humans struggle to understand what's going on in their program, which makes debugging and other software engineering impractical. But, unlike C++ loss of Sequential Consistency isn't fatal in Java, instead we're promised that when s is observed in the main thread it will either be that initial "uninitialized" string or it will have the args[0] value, ie the name of the program because these are the only two values it could have and Java does not specify which of them observed in this case.

You could think of this as "atomic access" and that's likely the actual implementation in this case, but the Java specification only promises what I wrote.

In C++ this is game over, the language standard specifically says it is Undefined Behaviour to have any data race and so the behaviour of your program is outside the standard - anything at all might happen.

[Edited: I neglected originally to observe that s is set to "uninitialized", and instead I assumed it begins as null]

dataflow•16h ago

> But, unlike C++ loss of Sequential Consistency isn't fatal in Java

I have no idea what you mean here. Loss of sequential consistency is in no way fatal in C++. There are several access modes that are specifically designed to avoid sequential consistency.

Regarding the rest of your comment:

You're making exactly my point though. These are guaranteed atomic accesses -- and like you said, we are guaranteed to see either the old or new value, and nothing else -- and yet they are still data races. Anyone who agrees this is a data race despite the atomicity must necessarily understand that atomics don't imply lack of data races -- not in general CS terminology.

The only way it's correct to say they are mutually exclusive is when you define "data race" as they did in the C++ standard, to imply a non-atomic access. Which you're welcome to do, but it's an incredibly pedantic thing to do because, for probably >95% of the users of C++ (and probably even of TSAN itself), when they read "data race", they assume it to mean the concept they understand from CS. They don't know that the ISO standard defines it in its own peculiar way. My point here was to convey something to normal people rather than C++ committee language lawyers, hence the use of the general term.

Maxatar•15h ago

>The only way it's correct to say they are mutually exclusive is when you define "data race" as they did in the C++ standard, to imply a non-atomic access.

A data-race does not imply a non-atomic operation, it implies an unsynchronized operation. Different languages have different requirements for what constitutes a synchronized operation, for example in Python all reads and writes are synchronized, in Java synchronization is generally accomplished through the use of a monitor or a volatile operation, and in C++ synchronization is generally accomplished through the use of std::atomic operations.

The fact that in C++ atomic operations result in synchronization, while in Java atomic operations are not sufficient for synchronization is not some kind of language lawyering or pedantry, it's because Java makes stronger guarantees about the consequences of a data race versus C++ where a data race can result in any arbitrary behavior whatsoever. As such it should not come as a surprise that the cost of those stronger guarantees is that Java has stronger requirements for data race free programs.

But of course this is mostly a deflection, since the discussion was about the use of TSAN, which is a data race detector for C and C++, not for Java. Hence to the extent that TSAN detects data races, it detects them with respect to C and C++'s memory model, not Java's memory model or Python's memory model, or any other memory model.

The objection I originally laid out was your example of a race condition, an example which can happen even in the absence of parallelism (ie. a single-core CPU) and even in the absence of multithreaded code altogether (your example can happen in a single threaded application with the use of coroutines). TSAN makes no claim with regards to detecting race conditions in general, it only seeks to detect data races and data races as they pertain the C and C++ memory models.

dataflow•12h ago

I am not "deflecting" anything, I am going to the heart of the matter.

Let me lay this out very explicitly. This comment will likely be my final one on the matter as this back-and-forth is getting quite tiresome, and I'm not enjoying it, especially with the swipes directed at me.

Please take a look at the following two C++ and Java programs: https://godbolt.org/z/EjWWac1bG

For the sake of this discussion, assume the command-line arguments behave the same in both languages. (i.e. ignore the C++ argv[0] vs. Java args[0] distinction and whatnot.)

Somehow, you simultaneously believe (a) that Java program contains data races, while believing (b) the C++ program does not.

This is a self-contradictory position. The programs are well-defined, direct translations of each other. They are equivalent in everything but syntax. If one contains a data race, so must the other. If one does not, then neither can the other.

This implies that TSAN does not detect "data races" as it is usually understood in the CS field -- it does not detect anything in the C++ program. What TSAN detects is only a particular, distinct situation that the C++ standard also happens to call a "data race". So if you're talking to a C++ language lawyer, you can say TSAN detects all data races within its window/buffer limits. But if you're talking to someone who doesn't sleep with the C++ standard under their pillow, they're not going to realize C++ is using a language-specific definition, and they're going to assume it flags programs like the equivalent of the Java program above, which has a data race but whose equivalent TSAN would absolutely not flag.

Maxatar•4h ago

Yes, C++ and Java have different conditions for what a data race is.

That your position hinges on thinking all languages share the same memory model suggests a much deeper failure to understand some of the basic principles of writing correct parallel software and while numerous people have tried to correct you on this, you still seem adament on doubling down on your position so I don't think there is much point in continuing this.

dataflow•4h ago

> That your position hinges on thinking all languages share the same memory model suggests a much deeper failure to understand some of the basic principles of writing correct parallel software and while numerous people have tried to correct you on this, you still seem adament on doubling down on your position so I don't think there is much point in continuing this.

I never suggested "all languages share the same memory model". You're severely mischaracterizing what I've said and putting words in my mouth.

What I said was (a) data races are generals properties of programs that people can and do discuss in language-agnostic contexts, and (b) it makes no sense to say two well-defined, equivalent programs differ in whether they have data races. Reducing these statements down to "all programs share the same memory model" as if they're somehow equivalent makes for an incredibly unfaithful caricature of everything I've said. Yes, I can see there's no point in continuing.

kiitos•34m ago

> data races are generals properties of programs that people can and do discuss in language-agnostic contexts

"Data race" is a specific property defined by a memory model, which is normally part of a language spec; it's not usually understood as an abstract property defined in terms of outcome, at least not usefully. If you talk about data races as abstract and language-spec-agnostic properties, then yes, you're assuming a memory model that's shared across all programs and their languages.

dataflow•9m ago

> "Data race" is a specific property defined by a memory model, which is normally part of a language spec; it's not usually understood as an abstract property defined in terms of outcome, at least not usefully.

Really? To me [1] sure doesn't look useless:

> We use the standard definition of a data race: two memory accesses to the same address can be scheduled on different threads to happen concurrently, and at least one of the accesses is a write [16].

You're welcome to look at the [16] it cites, and observe that it is from 1991, entirely in pseudocode, with no mention of a "memory model". It so happens to use the phrase "access anomaly", but evidently that is used synonymously here.

> If you talk about data races as abstract and language-spec-agnostic properties, then yes, you're assuming a memory model that's shared across all programs and their languages.

No, nobody is assuming such a thing. Different memory models can still exhibit similar properties when analyzing file accesses. Just like how different network models can exhibit similar properties (like queue size bounds, wait times, etc.) when discussing network communication. Just because two things are different that doesn't mean they can't exhibit common features you can talk about in a generic fashion.

[1] https://www.cs.columbia.edu/~suman/docs/pla.pdf

[16] https://dl.acm.org/doi/pdf/10.1145/127695.122767

charleslmunger•3h ago

Actually I don't think your c++ program contains a data race, because the writes that populated the argument string happened before the atomic read. If you copied the string on the other thread before writing the pointer or you used a non atomic variable, I bet tsan would catch it.

dataflow•3h ago

> Actually I don't think your c++ program contains a data race, because the writes that populated the argument string happened before the atomic read

But the write to the static variable from the second thread is entirely unordered with respect to the read, despite the atomicity. If lack of ordering is your criterion for data races, doesn't that imply there is a data race between that write and that read?

tialaramex•12h ago

> Loss of sequential consistency is in no way fatal in C++. There are several access modes that are specifically designed to avoid sequential consistency.

Sure, if you work really hard you can write a C++ program which doesn't meet the 6.9.2.2 intro.races definition of a data race but does nevertheless lose sequential consistency and so it has in some sense well-defined meaning in C++ but humans can't usefully reason about it. You'll almost certainly trip and write UB when attempting this, but assuming you're inhumanly careful or the program is otherwise very simple so that you don't do that it's an exception.

My guess is that your program will be miscompiled by all extant C++ compilers and you'll be unhappy with the results, and further that if you can get committee focus on this at all they will prioritize making your program Undefined in C++ rather than "fixing" compilers.

Just don't do this. The reason for the exclusions in 6.9.2.2 is that what we want people to do is write correct SC code but using primitives which themselves can't guarantee that, so the person writing the code must do so correctly. The reason is not that somehow C++ programmers are magicians and the loss of SC won't negatively impact the correctness of code they attempt to write, quite the contrary.

Maxatar•16h ago

The Java specification which can be found here [1] makes clear that with respect to its memory model the following is true:

    1. Per 17.4.5 your example can lead to a data race.

       "When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."

    2. Per 17.7 the variable s is accessed atomically.

       "Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values."

However, atomic reads and writes are not sufficient to protect against data races. What atomic reads and writes will protect against is word tearing (outlined in 17.6 where two threads simultaneously write to overlapping parts of the same object with the result being bits from both writes mixed together in memory). However, a data race involving atomic objects can still result in future reads from that object to result in inconsistent values, and this can last indefinitely into the future. This does not mean that reading from a reference will produce a garbage value, but it does mean that two different threads reading from the same reference may end up reading two entirely different objects. So, you can have thread A in an infinite loop repeatedly reading the value "uninitialized" and thread B in another infinite loop repeatedly reading the value args[0]. This can happen because both threads have their own local copy of the reference which will never be updated to reflect a consistent shared state.

As per 17.4.3, a data-race free program will not have this kind of behavior where two threads are in a perpetually inconsistent state, as the spec says "If a program has no data races, then all executions of the program will appear to be sequentially consistent."

So while atomicity protects against certain types of data corruption, it does not protect against data races.

[1] https://docs.oracle.com/javase/specs/jls/se24/html/jls-17.ht...

charleslmunger•16h ago

>When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.

Yes, your program contains a data race, by the definition used in the JLS. The set of outcomes you may observe from a data race are specified. I'm not sure if this choice was intentional or not, but there is a guarantee that you will either print the argument or "uninitialized" and no other behavior, because String relies on final field semantics. This is would not be true in c/c++ where the equivalent code is undefined behavior and you could see any result.

In Java you can have a data race and use it productively for certain niche cases, like String.hashcode - I've also contributed some to the Guava concurrency library. This is not true in c/c++ where data races (by their definition) are undefined behavior. If you want to do the tricks you can in racy Java without UB, you have to declare your variables atomic and use relaxed memory order and possibly fences.

vlovich123•23h ago

> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense

Not only are you incorrect, it’s even worse than you might think. Unsynchronized access to data in c++ is not only a data race, it’s explicitly undefined behavior and the compiler can choose to do whatever in response of an observed data race (which you are promising it isn’t possible by using the language).

You are also misinformed about the efficacy of TSAN. Even in TSAN you have to run it in a loop - if TSAN never observes the specific execution order in a race it’ll remain silent. This isn’t a theoretical problem but a very real one you must deeply understand if you rely on these tools. I recall a bug in libc++ with condition_variable and reproducing it required running the repro case in a tight loop like a hundred times to get even one report. And when you fixed it, how long would you run to have confidence it was actually fixed?

And yes, race conditions are an even broader class of problems that no tool other than formal verification or DST can help with. Hypothesis testing can help mildly but really you want at least DST to probabilistically search the space to find the race conditions (and DST’s main weakness aside from the challenge of searching a combinatorial explosion of states is that it still relies on you to provide test coverage and expectations in the first place that the race condition might violate)

dataflow•21h ago

Geez. I'm well aware it's UB. That was just sloppy wording. Should've said "not necessarily", okay. I only wrote "not in the C++ sense" because I had said "even atomically", not because I'm clueless.

manwe150•16h ago

TSAN observes the lack of an explicit order and warns about that, so it is better in some sense than just running normally in a loop and hoping to see the occurrence of a specific mis-ordering. But that part of it is a data race detector, so it cannot do anything for race conditions, and as soon as something is annotated as atomic, it cannot do anything to detect misuse. It can be better for lock evaluation, as it can check they are always acquired in the same order without needing to actually observe a conflicting deadlock occurring. But I agree you need more formal tooling to actually show the problem is eliminated and not just improbable

withoutboats3•23h ago

The alternative to C++ that I meant was Rust, which statically prevents data races.

sebtron•1d ago

TSan also catches data races that end up not having any visible effect, unlike rerunning and hoping to see the program misbehave.

vlovich123•23h ago

The false positive on TSAN is so low that it’s worth fixing any false positives although concluding it’s a false positive seems so hard that I always have a nagging feeling as to whether I actually did fix it

tialaramex•20h ago

"Data race which doesn't seem to modify the data" is actually often not a false positive, they're called "Benign data races" and well, go read the literature, it's often difficult to be sure they're truly "benign" and if there's a subtle reason why they aren't that's now a crazy hard to spot bug in your program.

So yeah, just fix 'em. Most of them you dodged a bullet, and even when you didn't it will now be easier for the next person to reason about the software.

gpderetta•1d ago

Probably I'm missing something, but enqueue dereferences the current tail node. What prevents it dereferencing a node that has just been deleted by dequeue.

Actually, nothing ever sets head so I'm not sure how anything is ever dequeued. Probably the implementation is incomplete and the queue needs a sentinel node somewhere.

john-h-k•1d ago

> The important thing to remember is that each of these cannot be split into separate instructions.

Nitpick, but they absolutely can be split into several instructions, and this is the most common way it’s implemented on RISClike processors, and also single instructions aren’t necessarily atomic.

The actual guarantee is that the entire operation (load, store, RMW, whatever) occurs in one “go” and no other thread can perform an operation on that variable during this atomic operation (it can’t write into the low byte of your variable as you read it).

It’s probably best euphemismed by imagining that every atomic operation is just the normal operation wrapped in a mutex, but implemented in a much more efficient manner. Of course, with large enough types, Atomic variables may well be implemented via a mutex

khrbtxyz•1d ago

https://en.wikipedia.org/wiki/Load-link/store-conditional

OskarS•1d ago

> It’s probably best euphemismed by imagining that every atomic operation is just the normal operation wrapped in a mutex

Sort-of, but that's not quite right: if you imagine it as "taking a mutex on the memory", there's a possibility of starvation. Imagine two threads repeatedly "locking" the memory location to update it. With a mutex, it's possible that one of them get starved, never getting to update the location, stalling indefinitely.

At least x86 (and I'm sure ARM and RISC-V as well) make a much stronger progress guarantee than a mutex would: the operation is effectively wait-free. All threads are guaranteed to make progress in some finite amount of time, no one will be starved. At least, that's my understanding from reading much smarter people talking about the cache synchronization protocols of modern ISAs.

Given that, I think a better mental model is roughly something like the article describes: the operation might be slower under high contention, but not "blocking" slow, it is guaranteed to finish in a finite amount of time and atomically ("in one combined operation").

john-h-k•1d ago

x86 and modern ARM64 has instructions for this, but original ARM and RISC approaches are literally a hardware-assisted polling loop. Unsure what guarantees they make, I shall look.

Definitely a good clarification though yeah, important

wbl•1d ago

Really finicky ones, and the initial ones made none. I think for RISC-V it's something like max 16 instructions covered with no other memory accesses to be assured progress on ll/sc sequences.

That's to enable very minimal hardware implementations that can only track one line at a time.

OskarS•1d ago

I got curious about this, because really the most important guarantee is in the C++ memory model, not any of the ISAs, and conforming compilers are required to fulfill them (and it's generally more restrictive than any of the platform guarantees). It's a little bit hard to parse, but if I'm reading section 6.9.2.3 of the C++23 standard (from here [1]), operations on atomics are only lock-free, not wait-free, and even that might be a high bar on certain platforms:

    Executions of atomic functions that are either defined to be lock-free 
    (33.5.10) or indicated as lock-free (33.5.5) are lock-free executions.

    When one or more lock-free executions run concurrently, at least one should 
    complete.

        [Note 3 : It is difficult for some implementations to provide absolute 
        guarantees to this effect, since repeated and particularly inopportune 
        interference from other threads could prevent forward progress, e.g., by 
        repeatedly stealing a cache line for unrelated purposes between load-locked 
        and store-conditional instructions. For implementations that follow this 
        recommendation and ensure that such effects cannot indefinitely delay progress 
        under expected operating conditions, such anomalies can therefore safely be 
        ignored by programmers. Outside this document, this property is sometimes 
        termed lock-free. — end note]

I'm guessing that note is for platforms like you mention, where the underlying ISA makes this (more or less) impossible. I would assume in the modern versions of these ISAs though, essentially everything in std::atomic is wait-free, in practice.

[1] https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.p...

p_l•23h ago

My understanding is that C++ memory model essentially pulled the concurrency bit from their own imagination, and as a result the only architecture that actually maps to it is RISC-V which explicitly decided to support it.

p_l•23h ago

Early Alpha CPUs (no idea about laters) essentially had a single special register that asserted a mutex lock on a word-sized (64bit) location for any read/write operation on the Sysbus.

latchkey•1d ago

Previously:

Atomics and Concurrency https://news.ycombinator.com/item?id=38964675 (January 11, 2024 — 106 points, 48 comments)

coolThingsFirst•23h ago

Amazing, i have decided to learn cpp for fun and wow the designers of the lang had serious gray matter.

Binary search provided. Pair abstraction provided. Lower bound for binary search yup.

Auto for type inference and fast as hell on top of it. Crazy powerful lang and also multiplayform threads.

tialaramex•19h ago

C++ auto isn't type inference, as so often in C++ it's something weirder which they call type deduction.

In languages with type inference when it could be A or B and it's left to infer the type the compiler just always says it could be A or B, so the programmer needs to disambiguate.

But with deduction in some cases C++ will deduce that you meant A even though you could instead have explicitly chosen B here instead, and too bad if you did mean B because that's not what it deduced.

coolThingsFirst•8h ago

Can you show me some examples where the deduction is not the expected type?

Precision Clock Mk IV

A Lean companion to Analysis I

Oxfordshire clock still keeping village on time after 500 years

Show HN: PunchCard Key Backup

We're beating $359M in funding with two people and OCaml

Photos taken inside musical instruments

AtomVM, the Erlang virtual machine for IoT devices

The Two Ideals of Fields

Designing Pareto-optimal RAG workflows with syftr

Using Ed(1) as My Static Site Generator

Show HN: Fontofweb – Discover Fonts Used on a Website or Websites Using Font(s)

AI video you can watch and interact with, in real-time

Beware of Fast-Math

Using lots of little tools to aggressively reject the bots

Gradients Are the New Intervals

Exploring a Language Runtime with Bpftrace

Acclimation of Osmoregulatory Function in Salmon

Webb telescope helps refines Hubble constant, suggesting resolution rate debate

Atlas: Learning to Optimally Memorize the Context at Test Time

Surprisingly fast AI-generated kernels we didn't mean to publish yet

Show HN: AI Peer Reviewer – Multiagent System for Scientific Manuscript Analysis

Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

The ‘white-collar bloodbath’ is all part of the AI hype machine

The Illusion of Causality in Charts

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexity

C++ to Rust Phrasebook

Beating Google's kernelCTF PoW using AVX512

Microsandbox: Virtual Machines that feel and perform like containers

Revenge of the Chickenized Reverse-Centaurs

Show HN: Icepi Zero – The FPGA Raspberry Pi Zero Equivalent

Precision Clock Mk IV

A Lean companion to Analysis I

Oxfordshire clock still keeping village on time after 500 years

Show HN: PunchCard Key Backup

We're beating $359M in funding with two people and OCaml

Photos taken inside musical instruments

AtomVM, the Erlang virtual machine for IoT devices

The Two Ideals of Fields

Designing Pareto-optimal RAG workflows with syftr

Using Ed(1) as My Static Site Generator

Show HN: Fontofweb – Discover Fonts Used on a Website or Websites Using Font(s)

AI video you can watch and interact with, in real-time

Beware of Fast-Math

Using lots of little tools to aggressively reject the bots

Gradients Are the New Intervals

Exploring a Language Runtime with Bpftrace

Acclimation of Osmoregulatory Function in Salmon

Webb telescope helps refines Hubble constant, suggesting resolution rate debate

Atlas: Learning to Optimally Memorize the Context at Test Time

Surprisingly fast AI-generated kernels we didn't mean to publish yet

Show HN: AI Peer Reviewer – Multiagent System for Scientific Manuscript Analysis

Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

The ‘white-collar bloodbath’ is all part of the AI hype machine

The Illusion of Causality in Charts

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexity

C++ to Rust Phrasebook

Beating Google's kernelCTF PoW using AVX512

Microsandbox: Virtual Machines that feel and perform like containers

Revenge of the Chickenized Reverse-Centaurs

Show HN: Icepi Zero – The FPGA Raspberry Pi Zero Equivalent

Atomics and Concurrency

Comments