RW lock could be implemented using an array of length equal to slots and proper padding to ensure each slot is in its own face line (avoid invalidating CPU cache when different slot is read/written).
For read lock: Each task acquires the lock for their slot.
For write lock: Acquire lock from left most slot to right. Writes can starve readers when they block on in-flight reader at a different slot when moving from left to right.
I do not know how Rust RW locks are implemented.
In any case, optimizing this well would require a lot more knowledge of what’s going on under the hood. What are the keys? Can the entire map be split into several maps? Can a reader hold the rwlock across multiple lookups? Is a data structure using something like RCU an option?
One example is folly::SharedMutex, which is very battle-tested: https://uvdn7.github.io/shared-mutex/
There are more sophisticated techniques such as RCU or hazard pointers that make synchronization overhead almost negligible for readers, but they generally require to design the algorithms around them and are not drop-in replacements for a simple mutex, so a good RW mutex implementation is a reasonable default.
Well done this pattern gives you nearly free reads and cheap writes, sometimes cheaper than a lock.
For frequent writes a good RWLock is often better since RCU can degrade rapidly and badly under write contention.
whizzter•1h ago
hansvm•1h ago
We also tossed in an A/B system, so reads aren't delayed even while writes are happening; they just get stale data (also fine for our purposes).
the_duke•39m ago
It's essentially just an atomic pointer that can be swapped out.
[1] https://docs.rs/arc-swap/latest/arc_swap/
gpderetta•56m ago
A specific microarchitecture might alleviate this a bit with lower latency cross-core communication, but the solution (using a single naive RW lock to protect the cache) is inherently non-scalable.
PunchyHamster•46m ago