frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•6m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
1•o8vm•8m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•9m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•22m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•25m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•28m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•35m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•37m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•38m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•39m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•41m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•42m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•47m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•48m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•48m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•49m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•51m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•54m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•57m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•1h ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•1h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•1h ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
2•lifeisstillgood•1h ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•1h ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•1h ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•1h ago•1 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•1h ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•1h ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•1h ago•0 comments
Open in hackernews

Fun with Futex

https://blog.fredrb.com/2025/06/02/futex-fun/
95•ingve•8mo ago

Comments

gpderetta•8mo ago
> don’t wake this thread up while the value is still X

That's the wrong way to think about FUTEX_WAIT. What it does is "put this thread to sleep unless the value is not X".

> If you call futex wait and the value is unchanged, the sleeping thread will not wake up!

[I assume this was meant to be FUTEX_WAKE] I can't be bothered to check the kernel source or to test, but I would be surprised if this is true as it might cause missed wakeups in an ABA scenario. Futex_wake must wake up at least one successful futex_wait that happens-before the wake. Futexes are best understood as edge triggered, stateless (outside of the wait list itself) primitives, so the value at the futex location (as opposed to its address) is not really important[1], except as a guard to avoid missed wakeups.

Unfortunately the name itself (Fast Userspace Mutex) is a bit misleading, because a mutex is only one of the many things you can do with a futex. They really are a generalized waiting and signaling primitive.

[1] for plain WAIT and WAKE at least, the bitset operations or the robust futex operations are more complex and attach semantics to the value.

tialaramex•8mo ago
> I would be surprised if this is true as it might cause missed wakeups in an ABA scenario.

More importantly, what could "unchanged" even mean? For FUTEX_WAIT we provide val, a value we're saying is the value stored at the futex address, and the kernel can check that's true. But for FUTEX_WAIT val is filled out with a count - typically 1 meaning "Only wake one" or its maximum positive value meaning "everybody" although in principle if you can find a reason to wake up to 7 waiters but no more that's allowed.

kentonv•8mo ago
Came here because I had the same reaction but also can't be bothered to test so was hoping someone else did.

Guess we'll just never know for sure, lol.

skitter•8mo ago
Fun post! An alternative to using futexes to store thread queues in kernel space is to store them yourself. E.g. the parking_lot[0] Rust crate, inspired by WebKit[1], uses only one byte to store the unlocked/locked/locked_contended state, and under contention uses the address of the byte to index into a global open-addressing hash table of thread queues. You look up the object's entry, lock said entry, add the thread to the queue, unlock it, and go to sleep. Because you know that there is at most one entry per thread, you can keep the load factor very low in order to keep the mutex fast and form the thread queue out of a linked list of thread-locals. Leaking the old hash on resizing helps make resizing safe.

As a result, uncontended locks work the same as described in the blog post above; under contention, performance is similar to a futex too. But now your locks are only one byte in size, regardless of platform – while Windows allows 1-byte futexes, they're always 4 bytes on Linux and iirc Darwin doesn't quite have an equivalent api (but I might be wrong there). You also have more control over parked threads if you want to implement different fairness criteria, reliable timeouts or parking callbacks.

One drawback of this is that you can only easily use this within one process, while at least on Linux futexes can be shared between processes.

I've written a blog post[2] about using futexes to implement monitors (reëntrant mutexes with an associated condvar) in a compact way for my toy Java Virtual Machine, though I've since switched to a parking-lot-like approach.

[0]: https://github.com/amanieu/parking_lot [1]: https://webkit.org/blog/6161/locking-in-webkit [2]: https://specificprotagonist.net/jvm-futex.html

jcranmer•8mo ago
> But now your locks are only one byte in size,

That's not a very useful property, though. Because inter-core memory works on cache-line granularities, packing more than one lock in a cache line is a Bad Idea™. Potentially it allows you to pack more data being protected by a lock with that data... but alignment rules means that you're going to invariably end up spending 4 or 8 bytes (via a regular integer or a pointer) on that lock anyways.

gpderetta•8mo ago
Enough to be able to pack a mutex and a pointer together for example. If you are carefully packing your structs a one byte mutex is great.
skitter•8mo ago
Yup, that's what I'm doing - storing the two bits needed for an object's monitor in the same word as its compressed class pointer. The pointer doesn't change over the lock's lifetime.
vlovich123•8mo ago
In rust the compiler will auto-pack everything so your 1 byte mutex would be placed after any multibyte data to avoid padding.
scottlamb•8mo ago
That's typically not true due to the `Mutex<T>` design: the `T` gets padded to its alignment, then placed into the `struct Mutex` along with the signaling byte, and that struct is padded again before being put into the outer struct.

You can avoid this with a `parking_lot::Mutex<()>` or `parking_lot::RawMutex` guarding other contents, but then you need to use `unsafe` because the borrow checker doesn't understand what you're doing.

I coincidentally was discussing this elsewhere recently: https://www.reddit.com/r/rust/comments/1ky5gva/comment/mv3kp...

zozbot234•8mo ago
You could use CAS loops throughout to make your locks "less than one byte" in size, i.e. one byte, or perhaps one machine word, but using the free bits in that byte/word to store arbitrary data. (This is because a CAS loop can implement any read-modify-write operation on atomically sized data. But CAS will be somewhat slower than special-cased hardware atomics, so this is a bad idea for locks that are performance-sensitive.)
gpderetta•8mo ago
Single bit spin locks to protect things like linked list nodes are not unheard of.
mandarax8•8mo ago
But you can embed this 1 byte lock into other bigger objects (eg. high bytes of a pointer).

With 4 byte locks your run into the exact same false sharing issues.

gmokki•8mo ago
Doesn't the futex2 syscall allow 1 byte futexes on recent kernel?

Double checks. Nope. The api is there and the patch to implement them has been posted multiple times: https://lore.kernel.org/lkml/20241025093944.707639534@infrad...

But the small futex2 patch will not go forward until some users say they want/need the feature

geertj•8mo ago
The annoying thing about locks (at least the variant that waits) is not just that you have to enter the kernel and wait when the lock is not available (fair enough), but also that the current holder will have to wake you, which requires another dip into the kernel by the holder.

I have been thinking on and off on how to create a syscall-less wake operation. One way to get almost what you want is to have a polling io_uring. That still requires one kernel thread that busy polls per application. Maybe this is fine in some application architectures but it's not ideal.

It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task, allowing that kernel task to somehow figure out what futex to wake, without the interrupter having to enter the kernel.

zozbot234•8mo ago
The point of locks 'waiting' is really just that they degrade nicely under heavy contention, e.g. when more threads are trying to take the lock than you have available cores/harts. Busy polling will lead to terrible performance in such conditions, whereas threads that "wait" will do the right thing and leave CPU resources free for the active tasks to progress.
geertj•8mo ago
I mentioned busy polling as a means to an end, with the end being the ability to wake a thread without requiring a system call (ideally without busy polling!).
gpderetta•8mo ago
>It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task

Apparently Intel cpus were supposed to get user space interrupts which would do exactly this. I'm not sure of hardware was ever shipped with support though.

Also look into monitor/mwait.

nrds•8mo ago
> It would be nice if there was a way to use Intel's debug registers to write a value to some address, which would then interrupt some kernel task

What you have described is literally the syscall mechanism. That's what it is. You perform some register write (via a specific instruction) and an interrupt is taken to the kernel. Maybe you believe that an asynchronous interrupt would cost less than a synchronous interrupt for this particular objective but I'm not sure there's evidence for that claim.

gpderetta•8mo ago
An asynchronous interrupt would be more expensive, but if you can send it to another core, you do not need to pay the cost on this core, in particular you do not need to enter the kernel. This is particularly useful for remote wakeups when you want to schedule a thread on an another core.

As I mentioned elsewhere, intel was planning to add user-mode interrupts specifically for this sort of scenarios.

geertj•8mo ago
> in particular you do not need to enter the kernel

Yup, that's exactly what I was looking for.

I'll look async and user-mode interrupts, thanks for the search terms!

scottlamb•8mo ago
Related: did the idea of rseq `RSEQ_SCHED_STATE_FLAG_ON_CPU` for adaptive userspace mutexes [1] ever come to anything? I think there are a lot of userspace lock implementations using adaptive mutexes (including say `absl::Mutex` in C++ and `parking_lot::Mutex` in Rust). This seemed promising as a better way to decide when to switch from spinning to blocking.

[1] https://lwn.net/Articles/944895/

lilyball•8mo ago
Darwin has its own set of futex primitives that it only fairly recently made public API, see https://developer.apple.com/documentation/os/os_sync_wait_on.... But there is a problem with this approach on Darwin, which is that the Darwin kernel has a Quality of Service thread priority implementation that differs from other kernels such that mutexes implemented with spinlocks or with primitives like this are vulnerable to priority inversion. Priority inversion is of course possible on other platforms, but other kernels typically guarantee even low-priority threads always eventually get serviced, whereas on Darwin a low-QoS thread will only get serviced if there are no higher-QoS threads that want to run.

For this reason, on Darwin if you want a mutex of the sort this article describes, you'll typically want to reach for os_unfair_lock, as that will donate the priority of the waiting threads to the thread that holds the lock, thus avoiding the priority inversion issue.

gpderetta•8mo ago
in principle you would have the same issue with POSIX realtime scheduling (i.e. SCHED_FIFO, SCHED_RR), but these days by default linux will still reserve 5% of cpu time for non RT threads. This can be disabled though.