Kingmans Formula says that as you approach 100% utilization, waiting times explode.
The correct way to deal with this is bounded queue lengths and back pressure. I.e don’t deal with an overloaded queue, don’t allow an overloaded queue.
You may also want to implement reader/writer locks if your load has many more reads than writes.
Unfortunately, nobody really teaches you these things in a really clear way, and plenty of engineers don't fully understand it either.
mrngm•3d ago
[0] https://www.youtube.com/watch?v=lJ8ydIuPFeU
[1] https://bravenewgeek.com/everything-you-know-about-latency-i...