Looks like fixing the underlying bug is still in-progress, [1] I wonder how many lines of code it will take.
[0] https://github.com/aoli-al/jdk/commit/625420ba82d2b0ebac24d9...
You make it sound like there is some modern development superseding what java has, but that's absolutely not the case.
Like even rust is just pretty much a no-overhead `synchronized` on top of an object. It is necessary there, because data races are a fundamental memory safety issue, but Java is immune to that (it has "safe" data races). Logical bugs can trivially happen in either case - as an easy example even if all your fields are atomically mutated, the whole object may not make sense in certain states, like a date with February the 31st. Rust does nothing against such, and concurrent data structures have ample grounds for realistic examples of the above.
STM.
The terms 'atomic', 'thread-safe', and 'concurrent' collections are thrown around too loosely for application programmers IMO, for exactly your example above.
In other scenarios, 'atomics' refer to the ability to do one thing atomically. With STM, you can do two or more things atomically.
Likewise with 'thread-safe'. Thread-safe seems to indicate that the object won't break internally in the presence of multiple threads, which is too low of a bar to clear if your goal is to write an actually thread-safe application out of so-called 'thread-safe' parts.
STM has actual concurrent data structures, where you can write straight-line code like 'if this collection has at least 5 elements, then pop one'.
I don't think the Feb 31 example is that fair though, because if you want to construct a representation of Feb 31, who's going to stop you? And if you don't want to, plain old static types is the solution.
https://joeduffyblog.com/2010/01/03/a-brief-retrospective-on...
Also, a phenomenal writing (as are his other posts) on the whole concurrency landscape, see:
> A wondrous property of concurrent programming is the sheer number and diversity of programming models developed over the years. Actors, message-passing, data parallel, auto-vectorization, …; the titles roll off the tongue, and yet none dominates and pervades. In fact, concurrent programming is a multi-dimensional space with a vast number of worthy points along its many axes.
Here it is in 2006 featuring the same Tim from your article: https://www.youtube.com/watch?v=tve57vilywc
I didn't start using it in anger till 2013-2014 maybe? But I don't recall any major differences between what the video shows and how it works in 2025.
Anyway, postmortems usually boil down to two issues:
1) That's not how programmers usually do it
2) We couldn't pull it off
The most obvious explanation for 1 is 2. I, too, would be disappointed by the low-adoption rates of my new technology if I hadn't built it or released it to users.
But the article has some gems:
Transactions unfortunately do not address one other issue, which turns out to be the most fundamental of all: sharing. Indeed, TM is insufficient – indeed, even dangerous – on its own because it makes it very easy to share data and access it from multiple threads;
I cannot read this charitably. This is the only reason for, not a damning reason against. It's like doing research & development on condoms, and then realising it's a hopeless failure because they might be used for dangerous activities like sex. I already mentioned a great virtue of transactions is their ability to nest. But I neglected to say how this works. And in fact when we began, we only recognized one form of nesting. You’re in one atomic block and then enter into another one. What happens if that inner transaction commits or rolls back, before the fate of the outer transaction is known
You nest transactional statements, not the calls to atomic. The happy-path for an atomic is that it will commit; it should be obvious a priori that something that commits cannot be in the codepath that can be rolled back. Then that same intern’s casual statement pointing out an Earth-shattering flaw that would threaten the kind of TM we (and most of the industry at the time) were building. ...
An update in-place system will allow that transaction to freely change the state of x. Of course, it will roll back here, because isItOwned changed to true. But by then it is too late: the other thread using x outside of a transaction will see constantly changing state – torn reads even – and who knows what will happen from there. A known flaw in any weakly atomic, update in-place TM.
If this example appears contrived, it’s not. It shows up in many circumstances.
I agree that it's not contrived. It's in the problem-space of application writers. It's not a problem caused by introducing STM. We want an STM system to allow safe access to isItOwned & x, because it's a PITA to try to do this with locks.Look ma, no skimming: https://news.ycombinator.com/item?id=37647230
> You nest transactional statements, not the calls to atomic. The happy-path for an atomic is that it will commit; it should be obvious a priori that something that commits cannot be in the codepath that can be rolled back.
This makes absolutely no sense with my above correction.
"Make invalid states unrepresentable" - it's bad design that February the 31st is a thing in your data structure when that's invalid. You can't always avoid this, but it's appalling how bad most people's data structures are.
C's stdlib provides a tm structure in which day of the week is stored in a signed 32-bit integer. You know, for when it's the negative two billionth day of the week...
I think this phrase sounds good but is not applicable to systems that touch messy reality.
For example, I think it’s not even possible to apply it to the `tm` structure, as leap seconds are not known in advance.
But we can do a lot without challenging the messy reality. 61 second minutes are (regrettably) a thing in some time systems, but negative 1 million second minutes are not a thing, there's no need for this to be a signed integer!
The C standard library has the excuse that most of it is very old. We should do better.
There are plenty of improvements needed in the C time APIs, like sub-second precision, thread safety, and timezone awareness. What benefit is there to making the struct fields unsigned beyond some arbitrary purity test? This is still C, there are still plenty of ways to make invalid values. And it is nice to be able to subtract as well as add.
Heck, there's no way to encode the full Gregorian Calendar rules in the type system of any language I've ever used, so every choice is going to be a compromise. February 29 Not-A-Leap-Year and April 31 are still invalid dates even if you can outlaw January 0 and March 32.
Making all the fields in struct tm signed ints is clearly there to allow them to be manipulated and consistently so, since other types would obviously be better for size if nothing else.
But what I'm quite obviously talking about is a Rust struct with 3 atomic fields. Just because I can safely race on any of its fields, doesn't mean that the whole struct can safely be shared, yet it will be inferred to be Sync.
We can see immediately that your type is broken because it allows us to directly set the date to February 31st, there's no concurrency bug needed, the type was always defective.
void setDate(int month, int day) {
if (notValidDate(month, date)) { throw; }
this.month = month; // atomic
this.day = day // atomic
}
Yet the whole function is not "atomic"/transactional/consistent, and two threads running simultaneously may surface the above error.Of course it can ensure that it is consistent, C code can also just ensure that it is memory safe. This is just not an inherent property, and in general you will mess it up.
The only difference is that we can reliably solve memory safety issues (GC, Rusty's ownership model), but we have absolutely no way to solve concurrency issues in any model. The only solution is.. having a single thread.
In Rust this improved type doesn't have the defect, to call Rust's analogue of your setDate function you must have the exclusive mutable reference, which means there's no concurrency problem.
You have to do a whole lot of extra work to write the bug and why would you, just write what you meant and it behaves correctly.
Give it another go at understanding what I'm saying, cheers!
Oh my ... you never seen a proper Actor language, have you?
Have a look at Erlang and Pony, for starters. It will open your mind.
This in particular is great: https://www.ponylang.io/discover/what-makes-pony-different/#...
> Pony doesn’t have locks nor atomic operations or anything like that. Instead, the type system ensures at compile time that your concurrent program can never have data races. So you can write highly concurrent code and never get it wrong.
This is what I am talking about.
> You make it sound like there is some modern development superseding what java has, but that's absolutely not the case.
Both Actor-model languages and Rust (through a surprisingly different path: tracking aliases and lifetimes) do something that's impossible in Java (and most languages): prevent data races due to improper locking (as mentioned above, if your language even has locks and it doesn't make them safe like Rust does, you know you're going to have a really hard time. actor-languages just eliminate locks, and "manual concurrency", completely). Other kinds of races are still possible, but preventing data races go a very, very long way to making concurrency safe and easy.
You just made a bunch of concurrent algorithms un-implementable that would give much better performance for the benefit of.. having all the other unsolvable issues with concurrency? Like, all the same issues are trivially reproducible at a higher level, with loops within actors' communication that only appear under certain, very dynamic conditions, or a bunch of message passing ending up in an inconsistent state, just not on an "object" level, but on a "group of object" level.
It's a huge win. Absolutely game changing.
> You just made a bunch of concurrent algorithms un-implementable
Exactly! That's a good thing! You think you need those buggy algorithms, you just don't, at least in 99% of cases.
Yes, you can still end up with inconsistencies when you perform actions without the necessary checks, but those cases that remain are extremely easy to find and fix (and even make completely impossible by design), when compared to the horrors of mutable state with locks.
I guess there there are language features like co-routines/co-operative multi-tasking that make certain algorithms possible, but nothing about Java prevents implementing sound concurrency algorithms in general.
You wouldn't make that claim if your language didn't have locks.
Pony and Rust are both very interesting languages, but it is absolutely trivial to re-introduce locks with actors, even just accidentally, and then you are back at square 1. This is what you have to understand, their fundamental model has a one-to-one mapping to "traditional" multi-threading with locks. The same way you can't avoid the Turing model's gotchas, actors and stuff won't fundamentally change the landscape either.
Please have a read of https://joeduffyblog.com/2010/01/03/a-brief-retrospective-on... (and don't just skim it.)
(This was not written by some nobody, he does know what he talks about.)
Contrast this elegant simplicity with the many pitfalls of locks:
Data races. Like forgetting to hold a lock when accessing a certain piece of data. And other flavors of data races, such as holding the wrong lock when accessing a certain piece of data. Not only do these issues not exist, but the solution is not to add countless annotations associating locks with the data they protect; instead, you declare the scope of atomicity, and the rest is automatic.
Reentrancy. Locks don’t compose. Reentrancy and true recursive acquires are blurred together. If a locked region expects reentrancy, usually due to planned recursion, life is good; if it doesn’t, life is bad. This often manifests as virtual calls that reenter the calling subsystem while invariants remain broken due to a partial state transition. At that point, you’re hosed.
Performance. The tension between fine-grained locking (better scalability) versus coarse-grained locking (simplicity and superior performance due to fewer lock acquire/release calls) is ever-present. This tension tugs on the cords of correctness, because if a lock is not held for long enough, other threads may be able to access data while invariants are still broken. Scalability pulls you to engage in a delicate tip-toe right up to the edge of the cliff.
Deadlocks. This one needs no explanation.But STM doesn't solve e.g. deadlocks - there are automatisms that can detect them and choose a different retry mechanism to deal with them (see the linked article), but my general point you really want to ignore is that none of these are silver bullets.
Concurrency is hard.
> Fray also provides deterministic replay capabilities for debugging specific thread interleavings. Fray is designed to be easy to use and can be integrated into existing testing frameworks.
I wish I had this 20 years ago.
In the technical paper, Section 5.4 you mention that kotlin has non-determinism in the scheduler. Where does this non-determinism come from?
It seems unclear to me why Kotlin would inject randomness here, and I suspect that you may actually have identified a false positive in the Lincheck DSL.
In our paper, we found that Fray suffers from false negatives because of this missing feature. Lincheck supports Kotlin coroutines so it finds one more bug than Fray in LC-Bench.
We didn't make any claims about false positives in Lincheck.
To be clear, I made that claim :) I agree that the paper makes no such claim.
I wonder how this works when one runs test in parallel (something I always enable in any project). By this I mean configuring JUnit to run as many tests as cores are available to speed up the run of the whole test suite.
I took a peek at the code and I have the impression it doesn't work that well as it hooks into when a thread is started. Also, I'm not sure if this works with fibers.
Fray currently does not support virtual threads. We do have an open issue tracking it, but it is low priority.
[1]: https://docs.gradle.org/current/userguide/java_testing.html#...
Separately, we're looking at using fray for concurrency property testing, as a way to reliably catch concurrency issues in a distributed system by simulating it within a single JVM.
latchkey•8mo ago
masklinn•8mo ago
edit: for some reason the author overrode the background color on code blocks via an inline style of
from to make the background nigh imperceptibly darker, but then while the stylesheet properly switches the to #01242e in dark mode the inline override stays and blows it to bit.Not that it's amazing if you remove the inline stle, on account of operators and method names being styled pretty dark (#666 and #4070a0).
aoli-al•8mo ago
malcolmgreaves•8mo ago
aoli-al•8mo ago
NooneAtAll3•8mo ago
masklinn•8mo ago