I suppose someplace someone is running an embedded system without an OS on such a processor - but I'd expect they are still using extra cores and so have all of the above tricks someplace.
The only time I've manually written my own spin lock was when I had to coordinate between two different threads, one of which was running 16-bit code, so using any library was out of the question, and even relying on syscalls was sketchy because making sure the 16-bit code is in the right state to call a syscall itself is tricky. Although in this case, since I didn't need to care about things like fairness (only two threads are involved), the spinlock core ended up being simple:
"thunk_spin:",
"xchg cx, es:[{in_rv}]",
"test cx, cx",
"jnz thunk_has_data",
"pause",
"jmp thunk_spin",
"thunk_has_data:",Dennis Gustafsson – Parallelizing the physics solver – BSC 2025 https://www.youtube.com/watch?v=Kvsvd67XUKw
gafferongames•1h ago