It also doesn't mentionn the most obvious solution to this problem: adding a random factor to retry timing during backoff, since a major cause of it is everyone coming back at the precise instant a service becomes available again, only to knock it offline.
I would think that in the rare instance of multiple concurrent requests for the same key where none of the caches have it cached, it might just be worth it to take the slightly increased hit (if any) of going to the db instead of complicated it further and slowing down everyone else with the same mechanism.
This query will probably find loads already: https://github.com/search?q=language%3Atypescript+%22new+Map...
If you can, it's easier to have every client fetch from cache, and then a cron job e.g., every second, refresh the cache.
In CDN feature to prevent this is "Collapse Forwarding"
blakepelton•56m ago
OrbitCache is one example, described in this paper: https://www.usenix.org/system/files/nsdi25-kim.pdf
It should solve the thundering herd problem, because the switch would "know" what outstanding cache misses it has pending, and the switch would park subsequent requests for the same key in switch memory until the reply comes back from the backend server. This has an advantage compared to a multi-threaded CPU-based cache, because it avoids performance overheads associated with multiple threads having to synchronize with each other to realize they are about to start a stampede.
A summary of OrbitCache will be published to my blog tomorrow. Here is a "draft link": https://danglingpointers.substack.com/p/4967f39c-7d6b-4486-a...