> Also, we can see that pseudo 3-random is substantially better than pseudo 2-random, which indicates that k-random is probably an improvement over 2-random for the k. Some k-random policy might be an improvement over DIP.
This sounds very similar to tournament selection schemes in evolutionary algorithms. You can control the amount of selective pressure by adjusting the tournament size.
I think the biggest advantage here is performance. A 1v1 tournament is extremely cheap to run. You don't need to maintain a total global ordering of anything.
Consider repeatedly looping through n+1 objects when only n fit in cache. In that case LRU misses/evicts on every lookup! Your cache is useless and performance falls of a cliff! 2-random turns that performance cliff into a gentle slope with a long tail(?)
I bet this effect happens when people try to be smart and loop through n items, but have too much additional data to fit in registers.
Bubble sort seems pretty terrible, until you realize that it's interruptible. The set is always a little more sorted than before. So if you have realtime requirements and best-effort sorting, you can sort things between renders and live with the possibility of two things relative close to each other appearing a little glitched for a frame.
hinkley•4h ago
beoberha•3h ago
It is only better than leastconn when you have stale information to base your decision on. If you have perfect, live information, best will always be optimal.
contravariant•3h ago
If you had perfect information and could just pick whichever was provably lowest that'd would probably work. However keeping that information up to date also takes effort. And if your information is outdated it's easy to overload a server that you think doesn't have much to do or underload one that's long since finished with its tasks. Picking between 2 random servers introduces some randomness without allowing the spread to become huge.
hinkley•2h ago
kgeist•3h ago
yuliyp•3h ago
1. Random is the one algorithm that can't be fooled. So even if there's something against number of connections as a load metric, not using that metric alone dampens the problems.
2. There is a lag between selection and actually incrementing the load metric for the next request, meaning that just using the load metric alone is prone to oscillation
3. A machine that's broken (immediately errors all requests) can capture almost all requests, while 2-random means its damage is limited to 2x its weight fraction
4. For requests which are a mix of CPU and IO work, reducing convoying (i.e. many requests in similar phases) is good for reducing CPU scheduling delay. You want some requests to be in CPU-heavy phases while others are in IO-heavy phases; not bunched.
hinkley•2h ago
It’s one of those things I put in the “unreasonably effective” category.
nielsole•1h ago
hinkley•46m ago