frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Strange Attractors

https://blog.shashanktomar.com/posts/strange-attractors
208•shashanktomar•4h ago•24 comments

S.A.R.C.A.S.M: Slightly Annoying Rubik's Cube Automatic Solving Machine

https://github.com/vindar/SARCASM
75•chris_overseas•4h ago•11 comments

Futurelock: A subtle risk in async Rust

https://rfd.shared.oxide.computer/rfd/0609
273•bcantrill•11h ago•117 comments

Introducing architecture variants

https://discourse.ubuntu.com/t/introducing-architecture-variants-amd64v3-now-available-in-ubuntu-...
180•jnsgruk•1d ago•114 comments

Viagrid – PCB template for rapid PCB prototyping with factory-made vias [video]

https://www.youtube.com/watch?v=A_IUIyyqw0M
78•surprisetalk•4d ago•25 comments

Addiction Markets

https://www.thebignewsletter.com/p/addiction-markets-abolish-corporate
201•toomuchtodo•10h ago•186 comments

My Impressions of the MacBook Pro M4

https://michael.stapelberg.ch/posts/2025-10-31-macbook-pro-m4-impressions/
136•secure•17h ago•191 comments

A theoretical way to circumvent Android developer verification

https://enaix.github.io/2025/10/30/developer-verification.html
104•sleirsgoevy•7h ago•68 comments

Hacking India's largest automaker: Tata Motors

https://eaton-works.com/2025/10/28/tata-motors-hack/
150•EatonZ•3d ago•51 comments

Active listening: the Swiss Army Knife of communication

https://togetherlondon.com/insights/active-listening-swiss-army-knife
27•lucidplot•4d ago•15 comments

Fungus: The Befunge CPU(2015)

https://www.bedroomlan.org/hardware/fungus/
8•onestay42•2h ago•1 comments

Use DuckDB-WASM to query TB of data in browser

https://lil.law.harvard.edu/blog/2025/10/24/rethinking-data-discovery-for-libraries-and-digital-h...
149•mlissner•10h ago•39 comments

Perfetto: Swiss army knife for Linux client tracing

https://lalitm.com/perfetto-swiss-army-knife/
105•todsacerdoti•15h ago•10 comments

How We Found 7 TiB of Memory Just Sitting Around

https://render.com/blog/how-we-found-7-tib-of-memory-just-sitting-around
114•anurag•1d ago•25 comments

Leaker reveals which Pixels are vulnerable to Cellebrite phone hacking

https://arstechnica.com/gadgets/2025/10/leaker-reveals-which-pixels-are-vulnerable-to-cellebrite-...
214•akyuu•1d ago•130 comments

Why Should I Care What Color the Bikeshed Is?

https://www.bikeshed.com/
7•program•1w ago•3 comments

Kerkship St. Jozef, Antwerp – WWII German Concrete Tanker

https://thecretefleet.com/blog/f/kerkship-st-jozef-antwerp-%E2%80%93-wwii-german-concrete-tanker
11•surprisetalk•1w ago•1 comments

Signs of introspection in large language models

https://www.anthropic.com/research/introspection
112•themgt•1d ago•57 comments

Llamafile Returns

https://blog.mozilla.ai/llamafile-returns/
100•aittalam•2d ago•18 comments

Nix Derivation Madness

https://fzakaria.com/2025/10/29/nix-derivation-madness
155•birdculture•13h ago•57 comments

Show HN: Pipelex – Declarative language for repeatable AI workflows

https://github.com/Pipelex/pipelex
80•lchoquel•3d ago•15 comments

Photographing the rare brown hyena stalking a diamond mining ghost town

https://www.bbc.com/future/article/20251014-the-rare-hyena-stalking-a-diamond-mining-ghost-town
14•1659447091•4h ago•1 comments

Sustainable memristors from shiitake mycelium for high-frequency bioelectronics

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0328965
109•PaulHoule•14h ago•55 comments

AI scrapers request commented scripts

https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
192•ColinWright•12h ago•143 comments

The cryptography behind electronic passports

https://blog.trailofbits.com/2025/10/31/the-cryptography-behind-electronic-passports/
140•tatersolid•16h ago•89 comments

Apple reports fourth quarter results

https://www.apple.com/newsroom/2025/10/apple-reports-fourth-quarter-results/
136•mfiguiere•1d ago•192 comments

Pangolin (YC S25) is hiring a full stack software engineer (open-source)

https://docs.pangolin.net/careers/software-engineer-full-stack
1•miloschwartz•10h ago

Lording it, over: A new history of the modern British aristocracy

https://newcriterion.com/article/lording-it-over/
49•smushy•6d ago•103 comments

The 1924 New Mexico regional banking panic

https://nodumbideas.com/p/labor-day-special-the-1924-new-mexico
48•nodumbideas•1w ago•1 comments

Attention lapses due to sleep deprivation due to flushing fluid from brain

https://news.mit.edu/2025/your-brain-without-sleep-1029
525•gmays•14h ago•255 comments
Open in hackernews

Futurelock: A subtle risk in async Rust

https://rfd.shared.oxide.computer/rfd/0609
272•bcantrill•11h ago
This RFD describes our distillation of a really gnarly issue that we hit in the Oxide control plane.[0] Not unlike our discovery of the async cancellation issue[1][2][3], this is larger than the issue itself -- and worse, the program that hits futurelock is correct from the programmer's point of view. Fortunately, the surface area here is smaller than that of async cancellation and the conditions required to hit it can be relatively easily mitigated. Still, this is a pretty deep issue -- and something that took some very seasoned Rust hands quite a while to find.

[0] https://github.com/oxidecomputer/omicron/issues/9259

[1] https://rfd.shared.oxide.computer/rfd/397

[2] https://rfd.shared.oxide.computer/rfd/400

[3] https://www.youtube.com/watch?v=zrv5Cy1R7r4

Comments

Sytten•7h ago
I am wondering if there is a larger RFC for Rust to force users to not hold a variable across await points.

In my mind futurelock is similar to keeping a sync lock across an await point. We have nothing right now to force a drop and I think the solution to that problem would help here.

cogman10•7h ago
The ideas that have been batted around is called "async drop" [1]

And it looks like it's still just an unaddressed well known problem [2].

Honestly, once the Mozilla sackening of rust devs happened it seems like the language has been practically rudderless. The RFC system seems almost dead as a lot of the main contributors are no longer working on rust.

This initiative hasn't had motion since 2021. [3]

[1] https://rust-lang.github.io/async-fundamentals-initiative/ro...

[2] https://rust-lang.github.io/async-fundamentals-initiative/

[3] https://github.com/rust-lang/async-fundamentals-initiative

raggi•7h ago
Those pages are out of date, and AsyncDrop is in progress: https://github.com/rust-lang/rust/issues/126482

I think "practically rudderless" here is fairly misinformed and a little harmful/rude to all the folks doing tons of great work still.

It's a shame there are some stale pages around and so on, but they're not good measures of the state of the project or ecosystem.

The problem of holding objects across async points is also partially implemented in this unstable lint marker which is used by some projects: https://dev-doc.rust-lang.org/unstable-book/language-feature...

You also get a similar effect in multi-threaded runtimes by not arbitrarily making everything in your object model Send and instead designing your architecture so that most things between wake-ups don't become arbitrarily movable references.

These aren't perfect mitigations, but some tools.

bigstrat2003•6h ago
In fairness, if you're a layman to the rust development process (as I am, so I'm speaking from personal experience here) it's damn near impossible to figure out the status of things. There tracking issues, RFCs, etc which is very confusing as an outsider and gives no obvious place to look to find out the current status of a proposal. I'm sure there is a logic to it and that if I spent the time to learn it would make sense. But it is really hard to approach for someone like me.
cogman10•6h ago
> I think "practically rudderless" here is fairly misinformed and a little harmful/rude to all the folks doing tons of great work still.

That great work is mostly opaque on the outside.

What's been noticeable as an observer is that a lot of the well known names associated with rust no longer work on it and there's been a large amount of turnover around it.

That manifests in things like this case where work was in progress up until ~2021 and then was ultimately backburnered while the entire org was reshuffled. (I'd note the dates on the MCP as Feb 2024).

I can't tell exactly how much work or what direction it went in from 2021 to 2024 but it does look apparent that the work ultimately got shifted between multiple individuals.

I hope rust is in a better spot. But I also don't think I was being unfair in pointing out how much momentum got wrecked when Mozilla pulled support.

raggi•5h ago
The language team tends to look at these kinds of challenges and drive them to a root cause, which spins off a tree of work to adjust the core language to support what's required by the higher level pieces, once that work is done then the higher level projects are unblocked (example: RPIT for async drop).

That's not always super visible if you're not following the working groups or in contact with folks working on the stuff. It's entirely fair that they're prioritizing getting work done than explaining low level language challenges to everyone everywhere.

I think you're seeing a lack of data and trying to use that as a justification to fit a story that you like, more than seeing data that is derivative of the story that you like. Of course some people were horribly disrupted by the changes, but language usage also expanded substantially during and since that time, and there are many team members employed by many other organizations, and many independents too.

And there are more docs, anyway:

https://rust-lang.github.io/rust-project-goals/2024h2/async.... https://rust-lang.github.io/rust-project-goals/2025h1/async.... https://rust-lang.github.io/rust-project-goals/2025h2/field-... https://rust-lang.github.io/rust-project-goals/2025h2/evolvi... https://rust-lang.github.io/rust-project-goals/2025h2/goals....

sunshowers•7h ago
Note that forcing a drop of a lock guard has its own issues, particularly around leaving the guarded data in an invalid state. I cover this a bit in my talk that Bryan linked to in the OP [1].

[1] timestamped: https://youtu.be/zrv5Cy1R7r4?t=1067

ameliaquining•7h ago
There's an existing lint that lets you prohibit instances of specific types from being held across await points: https://rust-lang.github.io/rust-clippy/stable/index.html#aw...
amluto•5h ago
I’m not convinced that this can help in a meaningful way.

Fundamentally, if you have two coroutines (or cooperatively scheduled threads or whatever), and one of them holds a lock, and the other one is awaiting the lock, and you don’t schedule the first one, you’re stuck.

I wonder if there’s a form of structured concurrency that would help. If I create two futures and start both of them (in Rust this means polling each one once) but do not continue to poll both, then I’m sort of making a mistake.

So imagine a world where, to poll a future at all, I need to have a nursery, and the nursery is passed in from my task and down the call stack. When I create a future, I can pass in my nursery, but that future then gets an exclusive reference to my future until it’s complete or cancelled. If I want to create more than one future that are live concurrently, I need to create a FutureGroup (that gets an exclusive reference to my nursery) and that allows me to create multiple sub-nurseries that can be used to make futures but cannot be used to poll them — instead I poll the FutureGroup.

(I have yet to try using an async/await system or a reactor or anything of the sort that is not very easy to screw up. My current pet peeve is this pattern:

    data = await thingy.read()
What if thingy.read() succeeds but I am cancelled? This gets nasty is most programming languages. Python: the docs on when I can get cancelled are almost nonexistent, and it’s not obviously possible to catch the CancelledError such that I still have data and can therefore save it somewhere so it’s not lost. Rust: what if thingy thinks it has returned the data but I’m never polled again? Maybe this can’t happen if I’m careful, but that requires more thought than I’m really happy with.)
orthecreedence•7h ago
Great read, and the example code makes sense. This stuff can be a nightmare to find, but once you do it's like a giant 1000 piece puzzle just clicks together instantly.
bcantrill•6h ago
Indeed. One of the interesting side effects of being a remote company that records everything[0] is that we have the instant where the "1000 piece puzzle just clicks together" recorded, and it's honestly pretty wild. In this case, it was very much a shared brainstorming between four engineers (Eliza, Sean, John and Dave) -- and there is almost a passing of the baton where they start to imagine the kind of scenario that could induce this and then realize that those are exactly the conditions that exist in the software.

We are (on brand?) going to do a podcast episode on this on Monday[1]; ahead of that conversation I'm going to get a clip of that video out, just because it's interesting to see the team work together to debug it.

[0] https://rfd.shared.oxide.computer/rfd/0537

[1] https://discord.gg/QrcKGTTPrF?event=1433923627988029462

mycoliza•6h ago
As a member of (Eliza, Sean, John, and Dave), I can second that debugging this was certainly an adventure. I'm not going to go as far as to say that we had fun, since...you can't have a heroic narrative without real struggle. But it was certainly rewarding to be in the room for that "a-ha!" moment, in which all the pieces really did begin to fit together very quickly. It was like the climax of a detective story --- and it was particularly well-scripted the way each of us contributed a little piece of the puzzle.
littlestymaar•5h ago
Since you are of of the people working directly on this codebase, may I ask you why is select! being used/allowed in the first place?

Its footgun-y nature has been known for years (IIRC even the first version of the tokio documentation warned against that) and as such I don't really understand why people are still using it. (For context I was the lead of a Rust team working on a pretty complex async networking program and we had banned select! very early in the project and never regretted this decision once).

0xdeafbeef•4h ago
What to use instead?
oconnor663•7h ago
> FAQ: doesn’t future1 get cancelled?

I guess cancellation is really two different things, which usually happen at the ~same time, but not in this case: 1) the future stops getting polled, and 2) the future gets dropped. In this example the drop is delayed, and because the future is holding a guard,* the delay has side effects. So the future "has been cancelled" in the sense that it will never again make forward progress, but it "hasn't been cancelled yet" in the sense that it's still holding resources. I wonder if it's practical to say "make sure those two things always happen together"?

* Technically a Tokio-internal `Acquire` future that owns a queue position to get a guard, but it sounds like the exact same bug could manifest after it got the guard too, so let's call it a guard.

jacquesm•7h ago
If any rust designers are lurking about here: what made you decide to go for the async design pattern instead of the actor pattern, which - to me at least - seems so much cleaner and so much harder to get wrong?

Ever since I started using Erlang it felt like I finally found 'the right way' when before then I did a lot of work with sockets and asynchronous worker threads. But even though it usually worked as advertised it had a large number of really nasty pitfalls which the actor model seemed to - effortlessy - step aside.

So I'm seriously wondering what the motivation was. I get why JS uses async, there isn't any other way there, by the time they added async it was too late to change the fundamentals of the language to such a degree. But rust was a clean slate.

sunshowers•7h ago
Not a Rust designer, but a big motivation for Rust's async design was wanting it to work on embedded, meaning no malloc and no threads. This unfortunately precludes the vast majority of the design space here, from active futures as seen in JS/C#/Go to the actor model.

You can write code using the actor model with Tokio. But it's not natural to do so.

lll-o-lll•6h ago
As a curious bystander, it will be interesting to see how the Zig async implementation pans out. They have the advantage of getting to see the pitfalls of those that have come before.

Getting back to Rust, even if not natural, I agree with the parent that the actor model is simply the better paradigm. Zero runtime allocation should still be possible, you just have to accept some constraints.

I think async looks simple because it looks like writing imperative code; unfortunately it is just obfuscating the complex reality underlying. The actor model makes things easier to reason about, even if it looks more complicated initially.

sunshowers•6h ago
I think you can do a static list of actors or tasks in embedded, but it's hard to dynamically spin up new ones. That's where intra-task concurrency is helpful.
throwawaymaths•2h ago
iiuc zig has thought about this specifically and there is a safe async-cancel in the new design that wasn't there in the old one.
shepmaster•6h ago
> But it's not natural to do so.

I tend to write most of my async Rust following the actor model and I find it natural. Alice Rhyl, a prominent Tokio contributor, has written about the specific patterns:

https://ryhl.io/blog/actors-with-tokio/

sunshowers•6h ago
Oh I do too, and that's one of the recommendations in RFD 400 as well as in my talk. cargo-nextest's runner loop [1] is also structured as two main actors + one for each test. But you have to write it all out and it can get pretty verbose.

[1] https://nexte.st/docs/design/architecture/runner-loop/

oconnor663•6h ago
Kind of a tangent, but I think "systems programming" tends to bounce back and forth between three(?) different concerns that turn out to be closely related:

1. embedded hardware, like you mentioned

2. high-performance stuff

3. "embedding" in the cross-language sense, with foreign function calls

Of course the "don't use a lot of resources" thing that makes Rust/C/C++ good for tiny hardware also tends to be helpful for performance on bigger iron. Similarly, the "don't assume much about your runtime" thing that's necessary for bare metal programming also helps a lot with interfacing with other languages. And "run on a GPU" is kind of all three of those things at once.

So yeah, which of those concerns was async Rust really designed around? All of them I guess? It's kind of like, once you put on the systems programming goggles for long enough, all of those things kind of blend together?

kibwen•1h ago
> So yeah, which of those concerns was async Rust really designed around? All of them I guess?

Yes, all of them. Futures needed to work on embedded platforms (so no allocation), needed to be highly optimizable (so no virtual dispatch), and need to act reasonably in the presence of code that crosses FFI boundaries (so no stack shenanigans). Once you come to terms with these constraints--and then add on Rust's other principles regarding guaranteed memory safety, references, and ownership--there's very little wiggle room for any alternative designs other than what Rust came up with. True linear types could still improve the situation, though.

mjevans•1h ago
In my view, the major design sin was not _forcing_ failure into the outcome list.

.await(DEADLINE) (where deadline is any non 0 unit, and 0 is 'reference defined' but a real number) should have been the easy interface. Either it yields a value or it doesn't, then the programmer has to expressly handle failure.

Deadline would only be the minimum duration after which the language, when evaluating the future / task, would return the empty set/result.

oconnor663•37m ago
> so no virtual dispatch

Speaking of which, I'm kind of surprised we landed on a Waker design that requires/hand-rolls virtual dispatch. Was there an alternate universe where every `poll()` function was generic on its Waker?

fpoling•6h ago
Rust async still uses a native stack which just a form of memory allocator that uses LIFO order. And controlling stack usage in the embedding world is just as important as not relying on the system allocator.

So its a pity that Rust async design tried so hard to avoid any explicit allocations rather than using an explicit allocator that embedding can use to preallocate and reuse objects.

zbentley•5h ago
> a native stack [is] just a form of memory allocator

There is a lot riding on that “just”. Hardware stacks are very, very unlike heap memory allocators in pretty much every possible way other than “both systems provide access to memory.”

Tons and tons of embedded code assumes the stack is, indeed, a hardware stack. It’s far from trivial to make that code “just use a dummy/static allocator with the same api as a heap”; that code may not be in Rust, and it’s ubiquitous for embedded code to not be written with abstractions in front of its allocator—why would it do otherwise, given that tons of embedded code was written for a specific compiler+hardware combination with a specific (and often automatic or compiler-assisted) stack memory management scheme? That’s a bit like complaining that a specific device driver doesn’t use a device-agnostic abstraction.

tony69•5h ago
Stack allocation/deallocation does not fragment memory, that’s a yuge difference for embedded systems and the main reason to avoid the heap
RossBencina•2h ago
Why would the actor model require malloc and/or threads?
raggi•6h ago
_an answer_ is performance - the necessity of creating copyable/copied messages for inter-actor communication everywhere in the program _can be_ expensive.

that said there are a lot of parts of a lot of programs where a fully inlined and shake optimized async state machine isn't so critical.

it's reasonable to want a mix, to use async which can be heavily compiler optimized for performance sensitive paths, and use higher level abstractions like actors, channels, single threaded tasks, etc for less sensitive areas.

lll-o-lll•6h ago
I’m not sure this is actually true? Do messages have to be copied?
raggi•6h ago
if you want your actors to be independent computation flows and they're in different coroutines or threads, then you need to arrange that the data source can not modify the data once it arrives at the destination, in order to be safe.

in a single threaded fully cooperative environment you could ensure this by implication of only one coroutine running at a time, removing data races, but retaining logical ones.

if you want to eradicate logical races, or have actual parallel computation, then the source data must be copied into the message, or the content of the message be wrapped in a lock or similar.

in almost all practical scenarios this means the data source copies data into messages.

gleenn•6h ago
Isn't that something Rust is particularly good at, controlling the mutation of shared memory?
raggi•5h ago
yes
vlovich123•6h ago
In Rust wouldn’t you just Send the data?
sapiogram•5h ago
Rust solves this at compile-time with move semantics, with no runtime overhead. This feature is arguably why Rust exists, it's really useful.
raggi•4h ago
if you can always move the data that's the sweet spot for async, you just pass it down the stack and nothing matters.

all of the complexity comes in when more than one part of the code is interested in the state at the same time, which is what this thread is about.

zorgmonkey•4h ago
Rust moves are a memcpy where the source becomes effectively unitialized after the move (that is say it is undefined to access it after the move). The copies are often optimized by the compiler but it isn't guaranteed.

This actually caused some issues with rust in the kernel because moving large structs could cause you to run out the small amount of stack space availabe on kernel threads (they only allocate 8-16KB of stack compared to a typical 8MB for a userspace thread). The pinned-init crate is how they ended solving this [1].

[1] https://crates.io/crates/pinned-init

the__alchemist•6h ago
I'm surprised learning this too. I know the hobby embedded, and HTTP-server OSS ecosystem have committed to Async, but I didn't expect Oxide would.
mycoliza•6h ago
We actually don't use Rust async in the embedded parts of our system. This is largely because our firmware is based on a multi-tasking microkernel operating system, Hubris[1], and we can express concurrency at the level of the OS scheduler. Although our service processors are single-core systems, we can still rely on the OS to schedule multiple threads of execution.

Rust async is, however, very useful in single-core embedded systems that don't have an operating system with preemptive multitasking, where one thread of execution is all you ever get. It's nice to have a way to express that you might be doing multiple things concurrently in an event-driven way without having to have an OS to manage preemptive multitasking.

[1] https://hubris.oxide.computer/

jabedude•5h ago
Heh, this is super interesting to hear. Single threaded async/concurrent code is so fun and interesting to see. I’ve ran some tokio programs in single threaded mode just to see it in action
the__alchemist•6h ago
Your application needs concurrency. So, the answer is... switch your entire application, code style, and libraries it uses into a separate domain that is borderline incompatible with normal one? And has its own dialects that have their own compatibility barriers? Doesn't make sense to me.
mdasen•4h ago
I'd recommend watching this video: https://www.infoq.com/presentations/rust-2019/; and reading this: https://tokio.rs/blog/2020-04-preemption

I'm not the right person to write a tl;dr, but here goes.

For actors, you're basically talking about green threads. Rust had a hard constraint that calls to C not have overhead and so green threads were out. C is going to expect an actual stack so you have to basically spin up a real stack from your green-thread stack, call the C function, then translate it back. I think Erlang also does some magic where it will move things to a separate thread pool so that the C FFI can block without blocking the rest of your Erlang actors.

Generally, async/await has lower overhead because it gets compiled down to a state machine and event loop. Languages like Go and Erlang are great, but Rust is a systems programming language looking for zero cost abstractions rather than just "it's fast."

To some extent, you can trade overhead for ease. Garbage collectors are easy, but they come with overhead compared to Rust's borrow checker method or malloc/free.

To an extent it's about tradeoffs and what you're trying to make. Erlang and Go were trying to build something different where different tradeoffs made sense.

EDIT: I'd also note that before Go introduced preemption, it too would have "pitfalls". If a goroutine didn't trigger a stack reallocation (like function calls that would make it grow the stack) or do something that would yield (like blocking IO), it could starve other goroutines. Now Go does preemption checks so that the scheduler can interrupt hot loops. I think Erlang works somewhat similarly to Rust in scheduling in that its actors have a certain budget, every function call decrements their budget, and when they run of of budget they have to yield back to the scheduler.

moralestapia•7h ago
Hmm, curious to see if this could happen on JS. I'll reproduce the code.
raggi•7h ago
yes, you can produce similar issues with promise guarded states and so on as well, it's a fairly common issue in async programming, but can be surprising when it's hidden by layers of abstraction / far up/down a call-chain.
comex•4h ago
JS shouldn't have a direct equivalent because JS async functions are eager. Once you call an async function, it will keep running even if the caller doesn't await it, or stops awaiting it. So in the scenario described, the function next in line for the lock would always have a chance to acquire and release it. The problem in Rust is that async functions are lazy and only run while they're being polled/awaited (unless wrapped in tasks). A function that's next in line for the lock might never acquire it if it's not being polled, blocking progress for other functions that are being polled.
dvt•7h ago
> &mut future1 is dropped, but this is just a reference and so has no effect. Importantly, the future itself (future1) is not dropped.

There's a lot of talk about Rust's await implementation, but I don't really think that's the issue here. After all, Rust doesn't guarantee convergence. Tokio, on the other hand (being a library that handles multi-threading), should (at least when using its own constructs, e.g. the `select!` macro).

So, since the crux of the problem is the `tokio::select!` macro, it seems like a pretty clear tokio bug. Side note, I never looked at it before, but the macro[1] is absolutely hideous.

[1] https://docs.rs/tokio/1.34.0/src/tokio/macros/select.rs.html

raggi•6h ago
i forget if this part unwinds to the exact same place, but some of this kind of design constraint in tokio stems from the much earlier language capabilities and is prohibitive to adjust without breaking the user ecosystem.

one of the key advertised selling points in some of the other runtimes was specifically around behavior of tasks on drop of their join handles for example, for reasons closely related to this post.

oconnor663•6h ago
There's nothing `select!` could do here to force `future1` to drop, because it doesn't receive ownership of `future1`. If we wanted to force this, we'd have to forbid `select!` from polling futures by reference, but that's a pretty fundamental capability that we often rely on to `select!` in a loop for example. The blanket `impl<F> Future for &mut F where F: Future ...` isn't a Tokio thing either; that's in the standard library.
mycoliza•6h ago
What could `tokio::select!` do differently here to prevent bugs like this?

In the case of `select!`, it is a direct consequence of the ability to poll a `&mut` reference to a future in a `select!` arm, where the future is not dropped should another future win the "race" of the select. This is not really a choice Tokio made when designing `select!`, but is instead due to the existence of implementations of `Future` for `&mut T: Future + Unpin`[1] and `Pin<T: Future>`[2] in the standard library.

Tokio's `select!` macro cannot easily stop the user from doing this, and, furthermore, the fact that you can do this is useful --- there are many legitimate reasons you might want to continue polling a future if another branch of the select completes first. It's desirable to be able to express the idea that we want to continually poll drive one asynchronous operation to completion while periodically checking if some other thing has happened and taking action based on that, and then continue driving forward the ongoing operation. That was precisely what the code in which we found the bug was doing, and it is a pretty reasonable thing to want to do; a version of the `select!` macro which disallows that would limit its usefulness. The issue arises specifically from the fact that the `&mut future` has been polled to a state in which it has acquired, but not released, a shared lock or lock-like resource, and then another arm of the `select!` completes first and the body of that branch runs async code that also awaits that shared resource.

If you can think of an API change which Tokio could make that would solve this problem, I'd love to hear it. But, having spent some time trying to think of one myself, I'm not sure how it would be done without limiting the ability to express code that one might reasonably want to be able to write, and without making fundamental changes to the design of Rust async as a whole.

[1] https://doc.rust-lang.org/stable/std/future/trait.Future.htm... [2]: https://doc.rust-lang.org/stable/std/future/trait.Future.htm...

amluto•4h ago
I can imagine an alternate universe in which you cannot do:

1. Create future A.

2. Poll future A at least once but not provably poll it to completion and also not drop it. This includes selecting it.

3. Pause yourself by awaiting anything that does not involve continuing to poll A.

I’m struggling a bit to imagine the scenario in which it makes sense to pause a coroutine that you depend on in the middle like this. But I also don’t immediately see a way to change a language like Rust to reliably prevent doing this without massively breaking changes. See my other comment :)

grogers•3h ago
I'm not familiar with tokio, but I am familiar with folly coro in C++ which is similiar-ish. You cannot co_await a folly::coro::Task by reference, you must move it. It seems like that prevents this bug. So maybe select! is the low level API and a higher level (i.e. safer) abstraction can be built on top?
rtpg•3h ago
A meta-idea I have: look at all usages of `select!` with `&mut future`s in the code, and see if there are maybe 4 or 5 patterns that emerge. With that it might be possible to say "instead of `select!` use `poll_continuing!` or `poll_first_up!` or `poll_some_other_common_pattern!`".

It feels like a lot of the way Rust untangles these tricky problems is by identifying slightly more contextful abstractions, though at the cost of needing more scratch space in the mind for various methods

dap•5h ago
(author here)

Although the design of the `tokio::select!` macro creates ways to run into this behavior, I don't believe the problem is specific to `tokio`. Why wouldn't the example from the post using Streams happen with any other executor?

OptionOfT•6h ago
- wrong -
DAlperin•6h ago
I think the next sentence clarifies pretty well.

> In this case, what’s dropped is &mut future1. But future1 is not dropped, so the actual future is not cancelled.

oconnor663•6h ago
The author clearly understands these details. I think it's just a question of wording: did we "drop a reference (which has no effect)" or did we "not drop anything (because references don't implement Drop)"?
singron•6h ago
This sounds very similar to priority inversion. E.g. if you have Thread T_high running at high priority and thread T_low running at low priority, and T_low holds a lock that T_high wants to acquire, T_high won't get to run until T_low gets scheduled.

The OS can detect this and make T_low "inherit" the priority of T_high. I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead. I would guess detecting the "can't run" case would require quite a bit of overhead, but maybe it can be done.

I think an especially difficult factor is that you don't even need to use a direct await.

    let future1 = do_async_thing("op1", lock.clone()).boxed();
    tokio::select! {
      _ = &mut future1 => {
        println!("do_stuff: arm1 future finished");
      }
      _ = sleep(Duration::from_millis(500)) => {
        // No .await, but both will futurelock on future1.
        tokio::select! {
          _ = do_async_thing("op2", lock.clone()) => {},
          _ = do_async_thing("op3", lock.clone()) => {},
        };
      }
    };
I.e. so "can't run" detector needs to determine that no other task will run the future, and the future isn't in the current set of things being polled by this task.
oconnor663•6h ago
> I wonder if there is a similar idea possible with tokio? E.g. if you are awaiting a Mutex held by a future that "can't run", then poll that future instead.

Something like this could make sense for Tokio tasks. (I don't know how complicated their task scheduler is; maybe it already does stuff like this?) But it's not possible for futures within a task, as in this post. This goes all the way back to the "futures are inert" design of async Rust: You don't necessarily need to communicate with the runtime at all to create a future or to poll it or to stop polling it. You only need to talk to the runtime at the task level, either to spawn new tasks, or to wake up your own task. Futures are pretty much just plain old structs, and Tokio doesn't know how many futures my async function creates internally, any more than it knows about my integers or strings or hash maps.

mycoliza•5h ago
Yeah, a coworker coming from Go asked a similar question about why Rust doesn't have something like the Go runtime's deadlock detector. Your comment is quite similar to the explanation I gave him.

Go, unlike Rust, does not really have a notion of intra-task concurrency; goroutines are the fundamental unit of concurrency and parallelism. So, the Go runtime can reason about dependencies between goroutines quite easily, since goroutines are the things which it is responsible for scheduling. The fact that channels are a language construct, rather than a library construct implemented in the language, is necessary for this too. In (async) Rust, on the other hand, tasks are the fundamental unit of parallelism, but not of concurrency; concurrency emerges from the composition of `Future`s, and a single task is a state machine which may execute any number of futures concurrently (but not in parallel), by polling them until they cannot proceed without waiting and then moving on to poll another future until it cannot proceed without waiting. But critically, this is not what the task scheduler sees; it interacts with these tasks as a single top-level `Future`, and is not able to look inside at the nested futures they are composed of.

This specific failure mode can actually only happen when multiple futures are polled concurrently but not in parallel within a single Tokio task. So, there is actually no way for the Tokio scheduler to have insight into this problem. You could imagine a deadlock detector in the Tokio runtime that operates on the task level, but it actually could never detect this problem, because when these operations execute in parallel, it actually cannot occur. In fact, one of the suggestions for how to avoid this issue is to select over spawned tasks rather than futures within the same task.

mjevans•1h ago
Thank you. Every time I've tried to approach the concept of Rust's parallelism this is what rubs me the wrong way.

I haven't yet read a way to prove it's correct, or even to reasonably prove a given program's use is not going to block.

With more traditional threads my mental model is that _everything_ always has to be interrupt-able, have some form of engineer chosen timeout for a parallel operation, and address failure of operation in design.

I never see any of that in the toy examples that are presented as educational material. Maybe Rust's async also requires such careful design to be safely utilized.

zenmac•16m ago
Guess Rust is more built for memory safety not concurrency? Erlang maybe? Why can't we just have a language that is memory safe and built for concurrency? Like Ocaml and Erlang combine?
newpavlov•5h ago
>This goes all the way back to the "futures are inert" design of async Rust

Yeap. And this footgun is yet another addition to the long list of reasons why I consider the Rust async model with its "inert" futures managed in user space a fundamentally flawed un-Rusty design.

filleduchaos•4h ago
I feel there's a difference between a preference and a flaw. Rust has targets that make anything except inert futures simply unworkable, and in my opinion it's entirely valid for a programming language to prioritise those targets.
Rusky•4h ago
The requirement is that the futures are not separate heap allocations, not that they are inert.

It's not at all obvious that Rust's is the only possible design that would work here. I strongly suspect it is not.

In fact, early Rust did some experimentation with exactly the sort of stack layout tricks you would need to approach this differently. For example, see Graydon's post here about the original implementation of iterators, as lightweight coroutines: https://old.reddit.com/r/ProgrammingLanguages/comments/141qm...

filleduchaos•3h ago
On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.
Rusky•2h ago
This one is relevant because it avoids heap allocation while running the iterator and for loop body concurrently. Which is exactly the kind of thing that `async` does.
wahern•55m ago
It avoids heap allocation in some situations. But in principle the exact same optimization could be done for stackful coroutines. Heck, right now in C I could stack-allocate an array and pass it to pthread_create as the stack for a new thread. To avoid an overlarge allocation I would need to know exactly how much stack is needed, but this is exactly the knowledge the Rust compiler already requires for async/await.

What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.

Veserv•5h ago
I thought Rust async is a colored stackless coroutine model and thus it would be unsafe to continue execution of previously executing async functions.

To explain, generally speaking, stackless coroutine async only need coloring because they are actually “independent stack”less coroutines. What they actually do is that they share the stack for their local state. This forces async function execution to proceed in LIFO order so you do not blow away the stack of the async function executing immediately after which demands state machine transforms to be safe. This is why you need coloring unlike stackful coroutine models which can execute, yield, and complete in arbitrary order since their local state is preserved in a safe location.

treyd•5h ago
> thus it would be unsafe to continue execution of previously executing async functions.

There's more nuance than this. You can keep polling futures as often as you want. When an async fn gets converted into the state machine, yielding is just expressed as the poll fn returning as not ready.

So it is actually possible for "a little bit" of work to happen, although that's limited and gets tricky because the way wakers work ensure that normally futures only get polled by the runtime when there's actually work for them to do.

elchananHaas•5h ago
My view of this is that its closer to the basic 2 lock Deadlock.

Thread 1 acquires A. Thread 2 acquires B. Thread 1 tries to acquire B. Thread 2 tries to acquire A.

In this case, the role "A" is being played by the front of the Mutex's lock queue. Role "B" is being played by the Tokio's actively executed task.

Based on this understanding, I agree that the surprising behavior is due to Tokio's Mutex/Lock Queue implementation. If this was an OS Mutex, and a thread waiting for the Mutex can't wake for some reason, the OS can wake a different thread waiting for that Mutex. I think the difficulty in this approach has to do with how Rust's async is implemented. My guess is the algorithm for releasing a lock goes something like this:

1. Pop the head of the wait queue. 2. Poll the top level tokio::spawn'ed task of the Future that is holding the Mutex.

What you want is something like this

For each Future in the wait queue (Front to Back): Poll the Future. If Success - Break ???Something if everything fails???

The reason this doesn't work has to do with how futures compose. Futures compile to states within a state machine. What happens when a future polled within the wait queue completes? How is control flow handed back to the caller?

I guess you might be able to have some fallback that polls the futures independently then polls the top level future to try and get things unstuck. But this could cause confusing behavior where futures are being polled even though no code path within your code is await'ing them. Maybe this is better though?

johnisgood•5h ago
Off-topic but that code looks quite... complicated as opposed to what I would write in Erlang, Elixir, Go, or even C. Maybe it is just me.
jerf•4h ago
Erlang/Elixir and Go "solve" this problem by basically not giving you the rope to hang yourself in this particular way in the first place. This is a perfectly valid and sensible solution... but it is not the only solution. It means you're paying for some relatively expensive full locks that the Rust async task management is trying to elide, for what can be quite significant performance gains if you're doing a lot of small tasks.

It is good that not every language gives you this much control and gives some easier options for when those are adequate, but it is also good that there is some set of decent languages that do give you this degree of control for when it is necessary, and it is good that we are not surrendering that space to just C and/or C++. Unfortunately such control comes with footguns, at least over certain spans of time. Perhaps someone will figure out a way to solve this problem in Rust in the future.

arjie•6h ago
Wow, that makes sense afterwards but I would not have guessed at it immediately looking at the code. Very insidious. Great blogpost.
forrestthewoods•6h ago
I feel like I’m pretty good at writing multithreaded code. I’ve done it a lot. As long as you use primitives like Rust Mutex that enforce correctness for data access (ie no accessing data without the lock) it’s pretty simple. Define a clean boundary API and you’re off to the races.

async code is so so so much more complex. It’s so hard to read and rationalize. I could not follow this post. I tried. But it’s just a full extra order of complexity.

Which is a shame because async code is supposed to make code simpler! But I’m increasingly unconfident that’s true.

amelius•4h ago
Async code is simpler because you're implicitly holding a lock on the CPU. That's also why you should stay away from it: it increases latency. Especially since Rust is about speed and responsiveness. In general, async programming in Rust makes little sense.
forrestthewoods•3h ago
I love Rust. But I’m 100% convinced Rust chose the wrong tradeoffs with their async model. Just give me green threads and use malloc to grow the stack. It’s fine. That would have been better imho.
mjevans•1h ago
Hell, even force threads to be allocated from a bucket of N threads defined at compile time. Surely that'd work for embedded / GPU space?
wbl•6h ago
Sadly I'm away from my bookshelf but I think Concurrent ML solved this issue.
octoberfranklin•5h ago
It’s really important to understand what’s happening here

Then maybe you should take a moment to pick more descriptive identifiers than future1, future2, future3, do_stuff, and do_async_thing. This coding style is atrocious.

dap•4h ago
Is it possible that those names are intentionally chosen and actually do carry meaning?
gpm•2h ago
If you prefer "real" names you can always look at the actual code that had a bug - here it is before the bug was fixed: https://github.com/oxidecomputer/omicron/blob/a253f541a4a32a...
Matthias247•5h ago
As far as I remember from building these things with others within the async rust ecosystem (hey Eliza!) was that there was a certain tradeoff: if you wouldn’t be able to select on references, you couldn’t run into this issue. However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

I fully agree that this and the cancellation issues discussed before can lead to surprising issues even to seasoned Rust experts. But I’m not sure what really can be improved under the main operating model of async rust (every future can be dropped).

But compared to working with callbacks the amount of surprising things is still rather low :)

octoberfranklin•5h ago
> However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state. Yes, this is awkward, but it's the only logically coherent way. There is probably some macro voodoo that makes it ergonomic. But even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.

For the real root cause: https://news.ycombinator.com/item?id=45777234

mycoliza•5h ago
> No, just have select!() on a bunch of owned Futures return the futures that weren't selected instead of dropping them. Then you don't lose state.

How does that prevent this kind of deadlock? If the owned future has acquired a mutex, and you return that future from the select so that it might be polled again, and the user assigns it to a variable, then the future that has acquired the mutex but has not completed is still not dropped. This is basically the same as polling an `&mut future`, but with more steps.

octoberfranklin•5h ago
> How does that prevent this kind of deadlock?

Like I said, it doesn't:

> even this doesn't fix the root cause because dropping an owned Future isn't guaranteed to cancel it cleanly.

It fixes this:

> However you also wouldn’t be able run use select! in a while loop and try to acquire the same lock (or read from the same channel) without losing your position in the queue.

If you want to fix the root cause, see https://news.ycombinator.com/item?id=45777234

mycoliza•5h ago
Indeed, you are correct (and hi Matthias!). After we got to the bottom of this deadlock, my coworkers and I had one of our characteristic "how could we have prevented this?" conversations, and reached the somewhat sad conclusion that actually, there was basically nothing we could easily blame for this. All the Tokio primitives involved were working precisely as they were supposed to. The only thing that would have prevented this without completely re-designing Rust's async from the ground up would be to ban the use of `&mut future`s in `select!`...but that eliminates a lot of correct code, too. Not being able to do that would make it pretty hard to express a lot of things that many applications might reasonably want to express, as you described. I discussed this a bit in this comment[1] as well.

On the other hand, it also wasn't our coworker who had written the code where we found the bug who was to blame, either. It wasn't a case of sloppy programming; he had done everything correctly and put the pieces together the way you were supposed to. All the pieces worked as they were supposed to, and his code seemed to be using them correctly, but the interaction of these pieces resulted in a deadlock that it would have been very difficult for him to anticipate.

So, our conclusion was, wow, this just kind of sucks. Not an indictment of async Rust as a whole, but an unfortunate emergent behavior arising from an interaction of individually well-designed pieces. Just something you gotta watch out for, I guess. And that's pretty sad to have to admit.

[1] https://news.ycombinator.com/item?id=45776868

octoberfranklin•5h ago
For anybody who wants to cut to the chase, it's this:

> The behavior of tokio::select! is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.

This is, unfortunately, doing what it's supposed to do: acting as a footgun.

The design of tokio::select!() implicitly assumes it can cancel tasks cleanly by simply dropping them. We learned the hard way back in the Java days that you cannot kill threads cleanly all the time. Unsurprisingly, the same thing is true for async tasks. But I guess every generation of programmers has to re-learn this lesson. Because, you know, actually learning from history would be too easy.

Unfortunately there are a bunch of footguns in tokio (and async-std too). The state-machine transformation inside rustc is a thing of beauty, but the libraries and APIs layered on top of that should have been iterated many more times before being rolled out into widespread use.

littlestymaar•5h ago
I genuinely don't understand why people use select! at all given how much of a footgun it is.
octoberfranklin•5h ago
Well the less-footgun-ish alternative would look something like a Stream API, but the last time I checked tokio-stream wasn't stable yet.

Then you could merge a `Stream<A>` and `Stream<B>` into a `Stream<Either<A,B>>` and pull from that. Since you're dealing with owned streams, dropping the stream forces some degree of cleanup. There are still ways to make a mess, but they take more effort.

   ....................................
Ratelimit so I have to reply to mycoliza with an edit here:

That example calls `do_thing()`, whose body does not appear anywhere in the webpage. Use better identifiers.

If you meant `do_stuff()`, you haven't replaced select!() with streams, since `do_stuff()` calls `select!()`.

The problem is `select!()`; if you keep using `select!()` but just slather on a bunch of streams that isn't going to fix anything. You have to get rid of select!() by replacing it with streams.

mycoliza•5h ago
An analogous problem is equally possible with streams: https://rfd.shared.oxide.computer/rfd/0609#_how_you_can_hit_...
mycoliza•4h ago
In reply to your edit, that section in the RFD includes a link to the full example in the Rust playground. You’ll note that it does not make any use of ‘select!`: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Perhaps the full example should have been reproduced in the RFD for clarity…

kmeisthax•2h ago
No, dropping a Rust future is an inherently safe operation. Futures don't live on their own, they only ever do work inside of .poll(), so you can't "catch them with their pants down" and corrupt state by dropping them. Yield points are specifically designed to be cancel-safe.

Crucially, however, because Futures have no independent existence, they can be indefinitely paused if you don't actively and repeatedly .poll() them, which is the moral equivalent of cancelling a Java Thread. And this is represented in language state as a leaked object, which is explicitly allowed in safe Rust, although the language still takes pains to avoid accidental leakage. The only correct way to use a future is to poll it to completion or drop it.

The problem is that in this situation, tokio::select! only borrows the future and thus can't drop it. It also doesn't know that dropping the Future does nothing, because borrows of futures are still futures so all the traits still match up. It's a combination of slightly unintuitive core language design and a major infrastructure library not thinking things out.

hitekker•5h ago
Skimming through, this document feels thorough and transparent. Clearly, a hard lesson learned. The footnotes, in particular, caught my eye https://rfd.shared.oxide.computer/rfd/397#_external_referenc...

> Why does this situation suck? It’s clear that many of us haven’t been aware of cancellation safety and it seems likely there are many cancellation issues all over Omicron. It’s awfully stressful to find out while we’re working so hard to ship a product ASAP that we have some unknown number of arbitrarily bad bugs that we cannot easily even find. It’s also frustrating that this feels just like the memory safety issues in C that we adopted Rust to get away from: there’s some dynamic property that the programmer is responsible for guaranteeing, the compiler is unable to provide any help with it, the failure mode for getting it wrong is often undebuggable (by construction, the program has not done something it should have, so it’s not like there’s a log message or residual state you could see in a debugger or console), and the failure mode for getting it wrong can be arbitrarily damaging (crashes, hangs, data corruption, you name it). Add on that this behavior is apparently mostly undocumented outside of one macro in one (popular) crate in the async/await ecosystem and yeah, this is frustrating. This feels antithetical to what many of us understood to be a core principle of Rust, that we avoid such insidious runtime behavior by forcing the programmer to demonstrate at compile-time that the code is well-formed

rtpg•3h ago
I guess one big question here is whether there's a higher layer abstraction that is available to wrap around patterns to avoid this.

It does feel like there's still generally possibilities of deadlocks in Rust concurrency right? I understand the feeling here that it feels like ... uhh... RAII-style _something_ should be preventing this, because it feels like statically we should be able to identify this issue in this simple case.

I still have a hard time understanding how much of this is incidental and how much of this is just downstream of the Rust/Tokio model not having enough to work on here.

embedding-shape•1h ago
> I guess one big question here is whether there's a higher layer abstraction that is available to wrap around patterns to avoid this.

Something like Actors, on top of Tokio, would be one way: https://ryhl.io/blog/actors-with-tokio/

Dagonfly•5h ago
That's a really subtle version of the deadlock described in withoutboats FuturesUnordered post [0]

When using “intra-task” concurrency, you really have to ensure that none of the futures are starving.

Spawning task should probably be the default. For timeouts use tokio::select! but make sure all pending futures are owned by it. I would never recommend FuturesUnordered unless you really test all edge-cases.

[0] https://without.boats/blog/futures-unordered/

quietbritishjim•4h ago
Wow, it is simply outrageous that Rust doesn't just allow all active tasks to make progress. It creates a whole class of incomprehensible bugs, like this one, for no reason. Can any Rust experts explain why it's done this way? It seems like an unforced error.

In Python, I often use the Trio library, which offers "structured, concurrency": tasks are (only) spawned into lexical scopes, and they are all completed (waited for) before that scope is left. That includes waiting for any cancelled tasks (which are allowed to do useful async work, including waiting for any of their own task scopes to complete).

Could Rust do something like that? It's far easier to reason about than traditional async programs, which seems up Rust's street. As a bonus it seems to solve this problem, since a Rust equivalent would presumably have all tasks implicitly polled by their owning scope.

duped•3h ago
So there's a distinction between a task and a future. A future doesn't do anything until it's polled, and since there's nothing special about async runtimes (it's just user level code), it's always possible to create futures and never poll them, or stop polling them.

A task is a different construct and usually tied to the runtime. If you look at the suggestions in the RFD they call out using a task explicitly instead of polling a future in place.

There's some debate to be had over what constitutes "cancellation." The article and most colloquial definitions I've heard define it as a future being dropped before being polled to completion. Which is very clean - if you want to cancel a future, just drop it. Since Rust strongly encourages RAII, cleanup can go in drop implementations.

A much tougher definition of cancellation is "the future is never polled again" which is what the article hits on. The future isn't dropped but its poll is also unreachable, hence the deadlock.

wrs•3h ago
I wish "cancellation" wasn't used for both of those. It seems to obfuscate understanding quite a bit. We should call them "dropped" and "abandoned" or something.
duped•3h ago
I don't think anyone really calls the latter "cancellation" in practice. I'm just pointing out that "is never polled again" is the tricky state with futures.
quietbritishjim•3h ago
Interesting, thanks. So is it fair to say that if tokio::select!() only accepted tasks (or implicitly turned any futures it receives into tasks, like Python's asyncio.gather() does) then it wouldn't have this problem? Or, even if the async runtime is careful, is it still possible to create and fail to poll a raw Future by accident?
NobodyNada•3h ago
> if tokio::select!() only accepted tasks (or implicitly turned any futures it receives into tasks, like Python's asyncio.gather() does) then it wouldn't have this problem?

Yes, this is correct. However, many of the use cases for select rely on the fact that it doesn't run all the tasks to completion. I've written many a select! statement to implements timeouts or other forms of intentionally preempting a task. Sometimes I want to cancel the task and sometimes I want to resume it after dealing with the condition that caused the preemption -- so the behavior in the article is very much an intentional feature.

> even if the async runtime is careful, is it still possible to create and fail to poll a raw Future by accident?

This is also the case. There's nothing magic about a future; it's just an ordinary object with a poll function. Any code can create a future and do whatever it likes with it; including polling it a few times and then stopping.

Despite being included as part of Tokio, select! does not interact with the runtime or need any kind of runtime support at all. It's an ordinary function that creates a future which waits for the first of several "child" futures to complete; similar functions are also provided in other prominent ecosystem crates besides Tokio and can be implemented in user code as well.

wrs•3h ago
It's hard to answer the question because of unclear terminology (tasks vs. futures). There are ways to do structured concurrency in Rust, but they are for tasks, not futures. There's not really a concept of "active futures" (other than calling an "active future" one that returned Pending the last time you polled it).

A task is the thing that drives progress by polling some futures. But one of those futures may want to handle polling for other futures that it made, which is where this arises.

As the article says, one option is to spawn everything as a task, but that doesn't solve all problems, and precludes some useful ways of using futures.

levodelellis•4h ago
In October alone I seen 5+ articles and comments about multi-threading and I don't know why

I always said if your code locks or use atomics, it's wrong. Everyone says I'm wrong but you get things like what's described in the article. I'd like to recommend a solution but there's pretty much no reasonable way to implement multi-threading when you're not an expert. I heard Erlang and Elixir are good but I haven't tried them so I can't really comment

umvi•3h ago
> I always said if your code locks or use atomics, it's wrong. Everyone says I'm wrong but you get things like what's described in the article.

Ok so say you are simulating high energy photons (x-rays) flowing through a 3d patient volume. You need to simulate 2 billion particles propagating through the patient in order to get an accurate estimation of how the radiation is distributed. How do you accomplish this without locks or atomics without the simulation taking 100 hours to run? Obviously it would take forever to simulate 1 particle at a time, but without locks or atomics the particles will step on each others' toes when updating radiation distribution in the patient. I suppose you could have 2 billion copies of the patient's volume in memory and each particle gets its own private copy and then you merge them all at the end...

levodelellis•3h ago
From my understanding this talk describes how he implemented a solution for a similar problem https://www.youtube.com/watch?v=Kvsvd67XUKw

I'm saying if you're not writing multi-threaded code everyday, use a library. It can use atomics/locks but you shouldn't use it directly. If the library is designed well it'd be impossible to deadlock.

levodelellis•2h ago
To clarify by "your code" I mean your code excluding a library. A good library would make it impossible to deadlock. When I wrote mine I never called outside code during a lock so it was impossible for it to deadlock. My atomic code had auditing and test. I don't recommend people to write their own thread library unless they want to put a lot of work into it
mdasen•2h ago
I rewrote this in Go and it also deadlocks. It doesn't seem to be something that's Rust specific.

I'm going to write down the order of events.

1. Background task takes the lock and holds it for 5 seconds.

2. Async Thing 1 tries to take the lock, but must wait for background task to release it. It is next in line to get the lock.

3. We fire off a goroutine that's just sleeping for a second.

4. Select wants to find a channel that is finished. The sleepChan finishes first (since it's sleeping for 1 second) while Async Thing 1 is still waiting 4 more seconds for the lock. So select will execute the sleepChan case.

5. That case fires off Async Thing 2. Async Thing 2 is waiting for the lock, but it is second in line to get the lock after Async Thing 1.

6. Async Thing 1 gets the lock and is ready to write to its channel - but the main is paused trying to read from c2, not c1. Main is "awaiting" on c2 via "<-c2". Async Thing 1 can't give up its lock until it writes to c1. It can't write to c1 until c1 is "awaited" via "<-c1". But the program has already gone into the other case and until the sleepChan case finishes, it won't try to await c1. But it will never finish its case because its case depends on c1 finishing first.

You can use buffered channels in Go so that Async Thing 1 can write to c1 without main reading from it, but as the article notes you could use join_all in Rust.

But the issue is that you're saying with "select" in either Go or Rust "get me the first one that finishes" and then in the branch that finishes first, you are awaiting a lock that will get resolved when you read the other branch. It just doesn't feel like something that is Rust specific.

    func main() {
        lock := sync.Mutex{}
        c1 := make(chan string)
        c2 := make(chan string)
        sleepChan := make(chan bool)    
        
        go start_background_task(&lock)
        time.Sleep(1 * time.Millisecond) //make sure it schedules start_background_task first

        go do_async_thing(c1, "op1", &lock)
 
        go func() {
                time.Sleep(1 * time.Second)
                sleepChan <- true
        }()

        for range 2 {
                select {
                case msg1 := <-c1:
                        fmt.Println("In the c1 case")
                        fmt.Printf("received %s\n", msg1)
                case _ = <-sleepChan:
                        fmt.Println("In the sleepChan case")
                        go do_async_thing(c2, "op2", &lock)
                        fmt.Printf("received %s\n", <-c2) // "awaiting" on c2 here, but c1's lock won't be given up until we read it
                }
        }
        fmt.Println("all done")
    }

    func start_background_task(lock *sync.Mutex) {
        fmt.Println("starting background task")
        lock.Lock()
        fmt.Println("acquired background task lock")
        defer lock.Unlock()
        time.Sleep(5 * time.Second)
        fmt.Println("dropping background task lock")
    }

    func do_async_thing(c chan string, label string, lock *sync.Mutex) {
        fmt.Printf("%s: started\n", label)
        lock.Lock()
        fmt.Printf("%s: acuired lock\n", label)
        defer lock.Unlock()
        fmt.Printf("%s: done\n", label)
        c <- label
    }
jhhh•1h ago
I wrote a version of the article's code in Java and couldn't figure out why it was working until reading your example. I see now that the channel operations in Go must rendezvous which I assume matches Rust's Future behavior. Whereas, the Java CompletableFuture operations I was using to mimic the select aren't required to meet. Thanks for writing this.
mjevans•1h ago
Difference in Go is that you've _expressly_ constructed a dependency ring. Should Go or any runtime go out of it's way to detect a dependency ring?

This the programming equivalent of using welding (locks) to make a chain loop, you've just done it with the 3D space impossible two links case.

As with the sin of .await(no deadline), the sin here is not adding a deadline.

jhhh•2h ago
I read this once over and the part that doesn't seem to make sense to me is why the runtime chose, when there are two execution contexts up to the lock() in both future1 and future3, to wake up the main thread instead? I get why in a fair lock it would pick future1 but I don't get how that causes a different thread than the one holding the lock to execute.
tick_tock_tick•2h ago
It seems more and more clear every day that async was rushed out the door way to quickly in Rust.
kibwen•1h ago
There's a lot of improvements I could think of for async Rust, but there's basically nothing I would change about the fundamentals that underlie it (other than some tweaks to Pin, maybe, and I could quibble over some syntax). There's nothing rushed about it; it's a great foundation that demonstrably just needs someone to finish building the house on top of it (and, to continue the analogy, needs someone to finish building the sub-basement (cough, generalized coroutines)).
tick_tock_tick•1h ago
A foundation full of warts belongs in experimental. I don't know how by your own confession of the house and the sub-basement not yet being finished doesn't instantly mean it should have stayed in experimental.
qouteall•1h ago
Simplify: tokio::select! will discard other futures when one future progress.

The discarded futures will never be run again.

Normally when a future is discarded it's dropped. When a future holding lock is dropped, lock is released, but it's passing future borrow to select so the discarded future is not dropped while holding lock.

So it leaves a future that holds a lock that will never run again.