Thundering herd problem: Preventing the stampede

https://distributed-computing-musings.com/2025/08/thundering-herd-problem-preventing-the-stampede/

17•pbardea•20h ago

Comments

blakepelton•56m ago

Some recent academic work suggests implementing caches directly in network switches. Tofino switches are programmable enough that academics can implement this today.

OrbitCache is one example, described in this paper: https://www.usenix.org/system/files/nsdi25-kim.pdf

It should solve the thundering herd problem, because the switch would "know" what outstanding cache misses it has pending, and the switch would park subsequent requests for the same key in switch memory until the reply comes back from the backend server. This has an advantage compared to a multi-threaded CPU-based cache, because it avoids performance overheads associated with multiple threads having to synchronize with each other to realize they are about to start a stampede.

A summary of OrbitCache will be published to my blog tomorrow. Here is a "draft link": https://danglingpointers.substack.com/p/4967f39c-7d6b-4486-a...

fidotron•54m ago

This reads like LLM noise, with headings missing articles.

It also doesn't mentionn the most obvious solution to this problem: adding a random factor to retry timing during backoff, since a major cause of it is everyone coming back at the precise instant a service becomes available again, only to knock it offline.

glhaynes•12m ago

I don't associate missing articles with LLMs and I've known people, always for whom English was a second language, who dropped them often.

sriram_malhar•48m ago

This particular example of thundering herd isn't convincing. First, the database has a cache too, and the first query would end up benefiting the other queries for the same key. The only extra overhead is of the network, which is something a distributed lock would also have.

I would think that in the rare instance of multiple concurrent requests for the same key where none of the caches have it cached, it might just be worth it to take the slightly increased hit (if any) of going to the db instead of complicated it further and slowing down everyone else with the same mechanism.

chmod775•39m ago

This is just how you should implement any (clientside) cache in a concurrent situation. It's the obvious and correct way. I expect you'll find this pattern implemented with promises in thousands of javascript/typescript codebases.

This query will probably find loads already: https://github.com/search?q=language%3Atypescript+%22new+Map...

Ciantic•26m ago

I've stumbled on this twice now, usually you can use just CDN caching, but I once solved it with redis locks, and once with simply filling the cache periodically in the background.

If you can, it's easier to have every client fetch from cache, and then a cron job e.g., every second, refresh the cache.

In CDN feature to prevent this is "Collapse Forwarding"

Libghostty is coming

Markov chains are the original language models

Android users can now use conversational editing in Google Photos

Find SF parking cops

How to draw construction equipment for kids

Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools

Go has added Valgrind support

From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and more

Always Invite Anna

Mesh: I tried Htmx, then ditched it

Nine things I learned in ninety years

x402 — An open protocol for internet-native payments

Getting More Strategic

Restrictions on house sharing by unrelated roommates

Getting AI to work in complex codebases

Structured Outputs in LLMs

Thundering herd problem: Preventing the stampede

OpenDataLoader-PDF: An open source tool for structured PDF parsing

Zoxide: A Better CD Command

Zinc (YC W14) Is Hiring a Senior Back End Engineer (NYC)

Agents turn simple keyword search into compelling search experiences

Shopify, pulling strings at Ruby Central, forces Bundler and RubyGems takeover

Denmark wants to push through Chat Control

YAML document from hell (2023)

Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s throughput

Processing Strings 109x Faster Than Nvidia on H100

The Great American Travel Book: The book that helped revive a genre

Smooth weighted round-robin balancing

Show HN: Kekkai – a simple, fast file integrity monitoring tool in Go

Permeable materials in homes act as sponges for harmful chemicals: study

Thundering herd problem: Preventing the stampede

Comments

Libghostty is coming

Markov chains are the original language models

Android users can now use conversational editing in Google Photos

Find SF parking cops

How to draw construction equipment for kids

Launch HN: Strata (YC X25) – One MCP server for AI to handle thousands of tools

Go has added Valgrind support

From MCP to shell: MCP auth flaws enable RCE in Claude Code, Gemini CLI and more

Always Invite Anna

Mesh: I tried Htmx, then ditched it

Nine things I learned in ninety years

x402 — An open protocol for internet-native payments

Getting More Strategic

Restrictions on house sharing by unrelated roommates

Getting AI to work in complex codebases

Structured Outputs in LLMs

Thundering herd problem: Preventing the stampede

OpenDataLoader-PDF: An open source tool for structured PDF parsing

Zoxide: A Better CD Command

Zinc (YC W14) Is Hiring a Senior Back End Engineer (NYC)

Agents turn simple keyword search into compelling search experiences

Shopify, pulling strings at Ruby Central, forces Bundler and RubyGems takeover

Denmark wants to push through Chat Control

YAML document from hell (2023)

Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s throughput

Processing Strings 109x Faster Than Nvidia on H100

The Great American Travel Book: The book that helped revive a genre

Smooth weighted round-robin balancing

Show HN: Kekkai – a simple, fast file integrity monitoring tool in Go

Permeable materials in homes act as sponges for harmful chemicals: study