Go’s race detector has a mutex blind spot

https://doublefree.dev/go-race-mutex-blindspot/

73•GarethX•2d ago

Comments

TheDong•1d ago

You're using Go's race detector wrong if you expect it to actually catch all races. It doesn't, it can't, it's a best effort thing.

The right way to use the go race detector is:

1. Only turn it on in testing. It's too slow to run in prod to be worth it, so only in testing. If your testing does not cover a use-case, tough luck, you won't catch the race until it breaks prod.

2. Have a nightly job that runs unit and integ tests, built with -race, and without caching, and if any races show up there, save the trace and hunt for them. It only works probabilistically for almost all significant real-world code, so you have to keep running it periodically.

3. Accept that you'll have, for any decently sized go project, a chunk of mysterious data-races. The upstream go project has em, most of google's go code has em, you will to. Run your code under a process manager to restart it when it crashes. If your code runs on user's devices, gaslight your users into thinking their ram or processor might be faulty so you don't have to debug races.

4. Rewrite your code in rust, and get something better than the go race detector every time you compile.

The most important of those is 3. If you don't do anything else, do 3 (i.e. run your go code under systemd or k8s with 'restart=always').

ViewTrick1002•23h ago

The data race patterns in Go article from Uber is always a scary read.

https://www.uber.com/blog/data-race-patterns-in-go/

klabb3•23h ago

> Rewrite your code in rust, and get something better than the go race detector every time you compile.

Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>, and you no longer have data races. As a gift, you will get logical race conditions instead, that are even more difficult to detect, while being equally difficult to reproduce reliably in unit tests and patch.

Don’t get me wrong, Rust has done a ton for safety and pushed other languages to do better. I love probably 50% of Rust. But Rust doesn’t protect against logical races, lovelocks, deadlocks, and so on.

To write concurrent programs that have the same standards of testable, composable, expressive etc as we are expecting with sequential programs is really really difficult. Either we need new languages, frameworks or (best case) design- and architectural patterns that are easy to apply. As far as I’m concerned large scale general purpose concurrent software development is an unsolved problem.

catigula•23h ago

If it's solved the solution has been discarded at some point by other developers for being too cumbersome, too much effort, and therefore in violation of some sacred principle of their job needing to be effortless.

ViewTrick1002•23h ago

A well formed Go program would have the same logical race conditions to manage as well.

The Arc is only needed when you truly need to mutably share data.

Rust like Go has the full suite of different channels and what other patterns to share data.

jason_oster•22h ago

Small correction: The Arc is for sharing across threads, the Mutex is for mutation. But you are generally correct that they can be used independently.

ViewTrick1002•21h ago

Of course. But if you’re using a channel then it hides the inner constructs.

Comparing writing a web service in Go and rust you would likely also utilize Tokio which has a wide variety of well designed sync primitives.

https://docs.rs/tokio/latest/tokio/sync/index.html

mr90210•23h ago

> Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>

Also, don’t people know that a Mutex implies lower throughput depending on how long said Mutex is held?

Lock-free data structures/algorithms are attempt to address the drawbacks of Mutexes.

https://en.wikipedia.org/wiki/Lock_(computer_science)#Disadv...

speed_spread•23h ago

The overhead of Mutex for uncontended cases is negligible. If Mutex acquisition starts to measurably limit your production performance, you have options but will probably need to reconsider the use of shared mutable anyway.

valyala•22h ago

Lock-free data structures and algorithms access shared memory via various atomic operations such as compare-and-swap and atomic arithmetic. The throughout of these operations do not scale with the number of CPU cores. Contrary, the throughput usually reduces with the growing number of CPU cores because they need more time for synchronizing local per-CPU caches with the main memory. So, lock-free data structures and algorithms do not scale on systems with big number of CPU cores. It is preferred to use "shared nothing" data structures and algorithms instead, where every CPU core processes its own portion of state, which isn't shared among other CPU cores. In this case the local state can be processed from local per-CPU caches at the speed which exceeds the main memory read/write bandwidth and has smaller access latency.

gpderetta•17h ago

High write contention on a memory location do not scale with the number of cores (in fact it is bad even with two cores).

This is independent of using a mutex, a lock free algorithm or message passing (because at the end of the day a queue is a memory location).

johncolanduoni•21h ago

Lock-free and even wait-free approaches are not a panacea. Memory contention is fundamentally expensive with today’s CPU architectures (they lock, even if you ostensibly don’t). High contention lock-free structures routinely perform worse than serialized locking.

judofyr•21h ago

Lock-free data structures does not guarantee higher throughput. They guarantee lower latency which often comes at the expense of the throughput. A typical approach for implementing a lock-free data structure is to allow one thread to "take over" the execution of another one by repeating parts of its work. It ensures progress of the system, even if one thread isn't being scheduled. This is mainly useful when you have CPUs competing for work running in parallel.

The performance of high-contention code is a really tricky to reason about and depends on a lot of factors. Just replacing a mutex with a lock-free data structure will not magically speed up your code. Eliminating the contention completely is typically much better in general.

CodeBrad•23h ago

I may be biased, as I definitely love more than 50% of Rust, but Go also does not protect against logical races, deadlocks, etc.

I have heard positive things about the loom crate[1] for detecting races in general, but I have not used it much myself.

But in general I agree, writing correct (and readable) concurrent and/or parallel programs is hard. No language has "solved" the problem completely.

[1]: https://crates.io/crates/loom

TheDong•23h ago

As a sibling said, Go has all the same deadlocks, livelocks, etc you point out that rust doesn't cover, in addition to also having data-races that rust would prevent.

But, also, Go has way worse semantics around various things, like mutexes, making it much more likely deadlocks happen. Like in go, you see all sorts of "mu.Lock(); f(); mu.Unlock()" type code, where if it's called inside an `http.Handler` and 'f' panics, the program's deadlocked forever. In go, panics are the expected way for an http middleware to abort the server ("panic(http.ErrAbortHandler)"). In rust, panics are expected to actually be fatal.

Rust's mutexes also gate "ownership" of the inner object, which make a lot of trivial deadlocks compiler errors, while go makes it absolutely trivial to forget a "mu.Unlock" in a specific codepath and call 'Lock' twice in a case rust's ownership rules would have caught.

In practice, for similarly sized codebases and similarly experienced engineers, I see only a tiny fraction of deadlocks in concurrent rust code when compared to concurrent go code, so like regardless that it's an "unsolved problem", it's clear that in reality, there's something that's at least sorta working.

Xeoncross•23h ago

> and 'f' panics, the program's deadlocked forever

I don't see `mu.Lock(); f(); mu.Unlock()` anywhere really.

`mu.Lock(); defer mu.Unlock(); f();` is how everyone does it to prevent that possibility.

ViewTrick1002•22h ago

Until you have to call a slow function after the mutex access leading to the lock being held long enough to cause problems.

Now you either refactor into multiple functions, while ensuring all copies of possibly shared data when passing function arguments are correctly guarded or ”manually” unlock when you don’t need the mutex access anymore.

jerf•22h ago

OK, but you're not in "Go"-specific problems any more, that's just concurrency issues. There isn't any approach to concurrency that will rigorously prevent programmers from writing code that doesn't progress sufficiently, not even going to the extremes of Erlang or Haskell. Even when there are no locks qua locks to be seen in the system at all I've written code that starved the system for resources by doing things like trying to route too much stuff through one Erlang process.

ViewTrick1002•21h ago

I would say it is a Go specific problem with how mutexes and defer are used together.

In rust you would just throw a block around the mutex access changing the scoping and ensuring it is dropped before the slow function is called.

Call it a minimally intrusive manual unlock.

tialaramex•20h ago

In Rust you can also explicitly drop the guard.

    drop(foo); // Now foo doesn't exist, it was dropped, thus unlocking anything which was kept locked while foo exists

If you feel that the name drop isn't helpful you can write your own function which consumes the guard, it needn't actually "do" anything with it - the whole point is that we moved the guard into this function, so, if the function doesn't return it or store it somewhere it's gone. This is why Destructive Move is the correct semantic and C++ "move" was a mistake.

dfawcus•14h ago

Generally, in any language, I'd suggest of you're fiddling with lots of locks (be they mutexes, or whatever), then one is taking the wrong approach.

Specifically for Go, I'd try to address the problem in CSP style, so as to avoid explicit locks unless absolutely necessary.

Now for the case you mention, one can actually achieve the same in Go, it just takes a bit of prior work to set up the infra.

  type Foo struct {sync.Mutex; s string}
  
  func doLocked[T sync.Locker](data T, fn func(data T)) {
      data.Lock(); defer data.Unlock(); fn(data)
  }
  
  func main() {
      foo := &Foo{s: "Hello"}
      doLocked(foo, func(foo *Foo) {
        /* ... */
      })
      /* do the slow stuff */
  }

masklinn•21h ago

> OK, but you're not in "Go"-specific problems any more, that's just concurrency issues.

It’s absolutely a go-specific problem from defer being function scoped. Which could be ignored if Unlock was idempotent but it’s not.

TheDong•22h ago

From the golang github org in non-toy code:

1. https://github.com/golang/tools/blob/f7d99c1a286d6ec8bd4516a...

2. https://github.com/golang/sync/blob/7fad2c9213e0821bd78435a9...

There are dozens and dozens throughout the stdlib and other popular go code.

The singleflight case is quite common, if you have:

    mu.Lock()
    if something() {
      mu.Unlock()
      moreWorkThatDoesntWantMutex()
      return
    }
    mu.Unlock()

most gophers use manual lock/unlocks to be able to unlock early in an 'if' before a 'return', and that comes up often enough that it really does happen.

I see manual lock/unlock all the time, and semi-regularly run into deadlocks caused by it. Maybe you don't use any third-party open source libraries, in which case, good for you congrats.

gf000•21h ago

Trick question: what is the scope of `defer` in go?

klabb3•19h ago

Function body, so no don’t put it in your loop. Just break it out to a helper fn if needed. This isn’t a big problem in practice.

chrchang523•18h ago

Yes, though I think tooling could be better; if I had more spare time I'd write a linter which flagged defers in loops that didn't come with an accompanying comment.

pjmlp•2h ago

I always think, Go is open source why not just fork it to add feature XYZ, then I realize I am better off using languages whose communities appreciate modern language design, instead of wasting my spare time with such things.

empath75•20h ago

> Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>, and you no longer have data races.

Or you can just avoid shared mutable state, or use channels, or many of the other patterns for avoiding data races in Rust. The fun thing is that you can be sure that no matter what you do, as long as it's not unsafe, it will not cause a data race.

pkolaczk•20h ago

I wrote plenty of concurrent Rust code and the number of times I had to use Arc<Mutex> is extremely low (maybe a few times per thousands lines).

As for your statement that concurrency is generally hard - yes it is. But it is even harder with data races.

onionisafruit•23h ago

I configure ci to run tests with -race and that works out pretty well. I value short ci runs, so testing with -race is a sacrifice for me even if it only adds ~10 seconds typically. I like your idea of a regular job that runs without caching, but your best tip is gaslighting users. Maybe I should start prefixing error messages with “look what you made me do”.

franticgecko3•23h ago

> Have a nightly job that runs unit and integ tests

Not enough IMHO.

We run all tests on developer machines and CI with -race. Always.

It's probabilistic, so every developer 'make test' and every 'git push' is coverage.

aleksi•22h ago

> It's too slow to run in prod to be worth it

I disagree there. It is reasonable to run a few service instances with a race detector. I have a few services where _all_ instances are running with it just fine.

Xeoncross•22h ago

I'm so glad to be out of the dark ages of parallelism. Complaining about Go's race detector or exactly which types of logical races Rust can't prevent is such a breath of fresh air compared to all those other single-core languages we're paid to write with that had threading, async, or concurrency bolted-on as an afterthought.

I can only hope Go and Rust continue to improve until the next language generation comes along to surpass them. I honestly can't wait, things improved so much already.

tialaramex•21h ago

You know how a modern language like Rust doesn't have the unstructured control flow with features like "goto"† but only a set of structured control flow features, such as pattern matching, conditionals, loops and functions?

Structured Concurrency is the same idea, but for concurrency. Instead of that code to create an appropriate number of threads, parcel out work, and so on, you just express high level goals like "Do these N pieces of work in any order" or "Do A and B, and once either is finished also do C and D" and just as the language handles the actual machine code jumps for your control flow, that would happen for concurrency too.

Nothing as close to the metal as Rust has that baked in today, but it is beginning to be a thing in languages like Swift and you can find libraries which take this approach.

† C's goto is de-fanged from the full blown go-to arbitrary jump in early languages, but it's still not structured control flow.

pkolaczk•20h ago

Rust async streams or rayon come very close to what you describe as structured concurrency. Actually much closer than anything I saw in other mainstream languages eg Java or Go.

empath75•19h ago

Rayon is about as pure an example of it as you can imagine. In a lot of cases you just need to replace iter() with par_iter() and it just works.

masklinn•1h ago

scoped threads as well, though at a lower level of semantics (and probably less efficiently due to not being on top of a thread pool).

seanw444•19h ago

> Actually much closer than anything I saw in other mainstream languages eg Java or Go.

https://github.com/sourcegraph/conc

khuey•20h ago

> Nothing as close to the metal as Rust has that baked in today

Rust's futures/streams are basically what you're asking for. You need a crate rather than just the bare language but I don't think that's a particularly important distinction.

bheadmaster•18h ago

The ultimate argument against goto was the proof that structured concurrency could express any flowchart simply by using the switch statement.

Is there a similar proof for structured concurrency - that it can express anything that unstructured concurrency can?

ezst•18h ago

> Nothing as close to the metal as Rust has that baked in today

You should have a look at what's going on in Scala-land, with scala-native¹ (and perhaps the Gears² library for direct style/capabilities)

I like this style, though it's been too new and niche to get a taste of it being used at scale.

¹: https://scala-native.org/ ²: https://github.com/lampepfl/gears

jimbo808•21h ago

My guess is that next the language gen will be languages that AI generates, which are optimized to be readable to humans and writable by AI. Maybe even two layers, one layer that is optimized for human skimming, and another layer that actually compiles, which is optimized for AI to generate and for the computer to compile.

CodeBrad•20h ago

> which are optimized to be readable to humans and writable by AI

How might a language optimized for AI look different than a language optimized for humans?

mbonnet•19h ago

especially when LLMs "speak" human language.

Lvl999Noob•20h ago

For the current category of LLM based AI, "AI optimised" means "old and popular". Even if you add a layer that has much more details but may be a lot more verbose or whatever, that layer would not be "AI optimised".

toast0•20h ago

IMHO, shared memory parallelism as the norm, means we're still in the dark ages.

Yes, shared memory is useful sometimes, but I don't think it should be the norm. But I've done parallel stuff in lots of languages, most recently Erlang and Rust... Message passing is so much nicer than having threads all mucking about in the same data if you don't need them to. You can write message passing parallel code in Rust, but it's not the norm, and you'll have to do a lot of the plumbing.

pjmlp•3h ago

That is typical Go design school, even the channels stuff, we already had that in Java and .NET ecosystem, even if the languages don't have syntax sugar for launching co-routines.

But go-routines!

Well, on .NET land we would be using Task Processing Library, or Dataflow built on top of it, with tasks being switched over the various carrier threads.

Or if feeling fancy, reach out to async workflows on F# with computation expressions, even before async/await came to be.

While on the Java side, we would be using java.util.concurrent, with future computations, having fun with Groovy GPars, or Scala frameworks like Akka.

In both platforms, we could even go the extra mile and write our own scheduling algorithms, how those lightweight threads would be mapped into carrier threads.

Naturally not having some of the boilerplate to handle all of that, or using multiple languages on the same project, makes it easier, hence why now we have all those goodies, with async/await or virtual threads on top.

Jyaif•22h ago

I always run my Go code with `-race`, but I feel more comfortable writing C++ multithreaded code than Go thanks to the thread sanitizer annotations ( `__attribute__((guarded_by(guard)))` and others in the family).

The annotation also help me discover patterns, like when most of the functions of a class have the same annotations, maybe it means that all the functions of the class should have the same annotations.

I really wish an equivalent to those annotations came to Go.

reactordev•21h ago

    if id == 1 {
        counter++;
    }

Found your problem. /s

In all honesty, if you “do work” using channels then all your goroutines are “thread safe” as the channel keeps things in order. Also, mutex is working as intended. As you see in your post, -race sees this, it’s good. Now have one goroutine read from a chan, get rid of the mutex, all other goroutines write to the chan, perfection.

Terence Tao's NSF grants suspended

Every satellite orbiting earth and who owns them (2023)

The Untold Impact of Cancellation

Slow

How to Secure a Linux Server

Releasing weights for FLUX.1 Krea

How Hyper Built a 1m-Accurate Indoor GPS

The anti-abundance critique on housing is wrong

PHP-ORT: Machine learning inference for the web

Living with an Apple Lisa [video]

QUIC for the kernel

Ubiquiti launches UniFi OS Server for self-hosting

MacBook Pro Insomnia

“No tax on tips” is an industry plant

Gemini Embedding: Powering RAG and context engineering

Show HN: I made a website that makes you cry

Many countries that said no to ChatControl in 2024 are now undecided

You might not need tmux

Programmers aren’t so humble anymore, maybe because nobody codes in Perl

Pride Versioning 0.3.0

Rao Reading Algorithm (2024)

Show HN: Mcp-use – Connect any LLM to any MCP

Show HN: Rewindtty – Record and replay terminal sessions as structured JSON

Raspberry Pi 5 Gets a MicroSD Express Hat

Show HN: AgentMail – Email infra for AI agents

Face it: you're a crazy person

Show HN: KubeForge – A GUI for Kubernetes YAMLs

Secuso – Our Farewell from Google Play

Scientists and engineers craft radio telescope bound for the moon

Launch HN: Gecko Security (YC F24) – AI That Finds Vulnerabilities in Code