frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

An Interactive Guide to Rate Limiting

https://blog.sagyamthapa.com.np/interactive-guide-to-rate-limiting
120•sagyam•11h ago

Comments

fside•10h ago
I wonder if anyone has switched algorithms after hitting real-world scaling issues with one of those? Curious if there are any “gotchas” that only show up at scale. I only have experience with fixed window rate limiting
eknkc•10h ago
We used a token bucket one to allow say 100 requests immediately but the limit would actually replenish 10 per minute or something. Makes sense to allow bursts. This was to allow free tier users to test things. Unless they go crazy, they would not even notice a rate limiter.

Sliding window might work good with large intervals. If you have something like a 24h window, fixed window will abruply cut things off for hours.

I mostly work with 1 minute windows so its fixed all the way.

mparnisari•10h ago
We used leaky bucket IIRC and the issue I saw was that the distributed aspect of it was coded incorrectly and so depending on the node you hit you were rate-limited or not :facepalm:
hotpocket777•6h ago
So it wasn’t really implemented correctly then.
smadge•6h ago
I have experience with token bucket and leaky bucket (or at least a variation where a request leaves the bucket when the server is done processing it) to prevent overload of backend servers. I switched from token bucket to leaky bucket. Token bucket is “the server can serve X requests per second,” while leaky bucket is the “the server can process N requests concurrently.” I found the direct limit on concurrency much more responsive to overload and better controlled delay from contention of shared resources. This kind of makes sense because imagine if your server goes from processing 10 QPS to 5 QPS. If the server has a 10 QPS token bucket limit it keeps accepting requests and the request queue and response time goes to infinity.
mcdow•10h ago
Super cool. What did you use to build the interactive bits?
leoff•10h ago
It really looks AI generated
sagyam•9h ago
Yes, I usually prompt (Claude, GPT and Deepseek),on my rough vision, and take ideas from all of them. They never quite get it right on their own. But for a code that's deploy and forget, AI generated code is good enough.
leoff•3h ago
you're 80% there, especially with functionality, but it needs some polishing
sagyam•9h ago
It was Shadcn, Tailwind.
onionbagle•10h ago
Curious to hear if anyone has implemented these and what technology was used.
tra3•8h ago
Hierarchical token buckets are part of the linux kernel for traffic management for instance: https://linux.die.net/man/8/tc-htb
chrisweekly•10h ago
Excellent dataviz.

Related tangent, "HPBN" (High-Performance Browser Networking) is a great book that includes related concepts.

https://hpbn.co/

softfalcon•10h ago
Seconded, this book goes hand-in-hand with "Designing Data-Intensive Applications" by Martin Kleppmann [0].

[0](https://www.oreilly.com/library/view/designing-data-intensiv...)

loevborg•6h ago
Thanks for sharing!
buggeryorkshire•10h ago
No mention of CGNAT which caused me many problems at a previous role?
sagyam•2h ago
Does CGNAT do rate limiting? If so then is there some documentation I can lookup.
mdaniel•34m ago
I'm pretty sure GP means: all those users have egress from a finite number of IPv4 and thus if rate limiting is done by IP those behind the NAT are going to have a real bad time. It's true of all NAT setups, but the affected audience size for GCNAT could be outrageous
jsw•10h ago
I’ve found the AIMD algo (additive increase, multiplicative decrease) paired with a token bucket gives a nice way to have a distributed set of processes adapt to backend capacity without centralized state.

Also found that AIMD is better than a circuit breaker in a lot of circumstances too.

Golang lib of the above https://github.com/webriots/rate

hxtk•10h ago
Something I’ve long wondered is why you never hear about rate limiting algorithms that are based on the cost to serve the request or algorithms that dynamically learn the capacity of the system and give everyone a fair share.

In the field of router buffer management, there are algorithms like Stochastic Fair Blue, which does the latter, but is somewhat hard to apply to HTTP because you’d have to define a success/failure metric for each request (a latency target, for example), and clients would have to tolerate a small probability a request being rejected even when they’re not being rate limited.

In Google’s paper on the Zanzibar authorization system, they give brief mention to rate limiting clients based on a CPU time allocation, but don’t go into any detail since it’s not a paper on rate limiting.

It’s something that matters less today with ubiquitous autoscaling, where the capacity of the system is whatever you need it to be to give each user what they ask for up to their rate limit, but I’m surprised at my inability to find any detailed account of such a thing being attempted.

ithkuil•9h ago
Yes, autoscale is a thing but it's rarely instantaneous ; you'll still benefit from having a good handle on load fairness.

Furthermore, modern GPU workloads are way less elastic in capacity scaling

ucarion•9h ago
Shuffle-sharding is similar to stochastic Blue stuff, and you'll find Amazon talking about it:

https://aws.amazon.com/builders-library/workload-isolation-u...

Which isn't exactly what you're talking about, but between that and other things in the "Builder's Library" series, you can see that people are doing this, and writing about it.

wonnage•8h ago
Envoy has a latency-based adaptive concurrency feature: https://www.envoyproxy.io/docs/envoy/latest/configuration/ht...

Netflix has a blog post for their implementation: https://netflixtechblog.medium.com/performance-under-load-3e...

remus•8h ago
My assumption would be that it is a complexity thing. As a consumer of the service having a rate limit that is easy to understand and write retry logic for is a big plus. If the criteria is "x requests per 5 minute window" and I start getting rate limit errors it's very clear what back off behaviour I need to implement. If the criteria is CPU usage of my requests, as a consumer it's hard for me to reason about how much CPU a given request is going to take so my retry logic is going to be fairly dumb.
jsw•8h ago
I mentioned this in another place on this thread, but a simple AIMD algorithm paired with a token bucket is surprisingly effective at dynamically adjusting to available capacity, even across a fleet of services not sharing state other than the contended resource.

Pretty easy to pair AIMD with token bucket (eg https://github.com/webriots/rate)

hinkley•5h ago
One of those times when I was still learning that asking forgiveness is easier than asking permission, I wanted to eliminate a very expensive presence calculation that I and a coworker determined were accounting for almost 10% of average page load time. Some idiot in product has decided they wanted an OLTP-ish solution that told you -exactly- how many people were online and like a fool I asked if we could do a sane version and they said no. If you don't ask, then it's not insubordination.

For situations where eventual consistency is good enough, you can run a task in a loop that tries every n seconds to update a quantity. But as you say that can also saturate, so what you really want is for the task to update, then wait m seconds and go again, where m is more than the time you expect the task to complete in (<<50% duty cycle). As the cost of the operation climbs the time lag increases but the load on the system increases more slowly. If you want to, collect telemetry on how often it completes and set up alarms for if it doesn't for a duration that is several times longer than your spikiest loads happen.

I don't think voluntary rate limiting on the client side gets enough column inches. Peer to peer you end up footguning yourself if you bite off more than you can chew, and if you start stalling on responses then you gum up the server as well.

phelm•10h ago
See also https://smudge.ai/blog/ratelimit-algorithms
mtlynch•9h ago
This seems like someone used AI to generate the article and examples without much review. It's all bullet points, and it repeatedly uses "Working:" as a heading, which doesn't make any sense to me.

The site defaults to dark mode for me (I'm assuming based on my system preferences), but all the examples are unusable with dark mode. The examples all seem to be within fixed-width divs that cut off part of the content unless you scroll within the example.

sagyam•9h ago
- I like bullet points, they are easy to read.

- "Working" I wanted to keep things consistent.

- Content getting cut was a limitation of iframe. Most blogging platform don't allows you to embed another page. This was best I could do given the limitation.

- I do use AI to bounce ideas, but a lot of effort went into getting the apps working as intended.

mtlynch•7h ago
Why "Working?" It's unclear what that means.

Is it supposed to say, "How it works"?

sagyam•2h ago
Now that you have mentioned it should have been working principle or algorithm. It made sense in my head. English isn't my first language sorry about that.
tonyhart7•9h ago
another article with great visualization for rate limit

https://smudge.ai/blog/ratelimit-algorithms

behnamoh•9h ago
> Follow Sagyam's Blog's journey

> By following, you'll have instant access to our new posts in your feed.

> Continue with Google

> More options

As soon as I see this in a blog, I quit tab. Why do authors do this to themselves?

Strom•3h ago
They do it when their real goal is to funnel you into a newsletter to later sell you stuff. The only purpose of the article is to show you that prompt.
sagyam•2h ago
Sorry about that that is my blogging platform Hashnode. It was lesser of four evils:

- Medium which paywalls the article and forces you to sign up just to read.

- Substack has same problem, it's great for funneling people to your paid newsletter but there is a sign up banner as soon the page loads.

- Build your own and miss out on the social aspect and there's no proof if the numbers are real.

tra3•8h ago
I tried to explain the benefits of circuit breakers and adaptive concurrency to improve the performance of our distributed monolith, but I failed. I tried to visualize it using step by step packet diagrams but failed. This is hard stuff to understand.

Great visualization tools. Next time I have to explain it, I'll reach for these.

jhlee525•7h ago
Easy to understand.
conradludgate•4h ago
My favourite algorithm is generic cell rate algorithm (GCRA). It works like token bucket in this post. The implementation is dead simple and requires no background tasks and needs very minimal state.

Instead of storing the current number of tokens, you instead store when the bucket will be full. If you take a token from the bucket, you increment the timestamp accordingly by 1/rps. The only complication is it the filled timestamp was in the past, you have to first update it with the current timestamp to avoid overfilling.

What's even nicer is that it doubles as a throttle implementation rather than just a rate limiter. You know the bucket is empty if you compute empty_at=filled_at-(max_tokens/rps) which is still in the future. From that calculation you now know when it will have capacity again, so you can sleep accordingly. If you use a queue before the gcra, it then starts sowing down new connections rather than just dropping them.

You should still have a limit on the queue, but it's nice in that it can gracefully turn from token bucket into leaky bucket.

sagyam•1h ago
Intresting, I am working on another list of more advance rate limiting algorithms. I will add GCRA there.
deadfa11•27m ago
Ahh this has a name! I started doing this years ago and figured this must be used frequently because it’s so simple, elegant and can be done lock free. Thanks for calling it the name!

Musk's XAI Is Trying to Borrow $5B While His Relationship with Trump Blows Up

https://www.wsj.com/finance/musks-xai-is-trying-to-borrow-5-billion-while-his-relationship-with-trump-blows-up-4b963361
1•TheAlchemist•9m ago•0 comments

We Should Immediately Nationalize SpaceX and Starlink

https://jacobin.com/2025/06/musk-trump-nationalize-spacex-starlink
2•Improvement•10m ago•0 comments

Musk cites support on X poll to push for new political party

https://www.wionews.com/world/-the-american-party-musk-cites-80-support-on-x-poll-to-push-for-new-political-party-in-us-1749257936475
1•geox•11m ago•1 comments

ACLU sues Sonoma County, alleges illegal drone surveillance program

https://www.ktvu.com/news/aclu-sues-sonoma-county-alleges-illegal-drone-surveillance-program
2•walterbell•17m ago•0 comments

Show HN: Email Scraper for Instagram

https://chromewebstore.google.com/detail/email-scraper-for-ins/nhgbjmidfpboihkaechkkmbiimecddda
1•qwikhost•18m ago•0 comments

A New System Aims to Save Injured Brains and Lives

https://www.nytimes.com/2025/05/20/health/traumatic-brain-injury-tbi-guidelines.html
1•bookofjoe•20m ago•1 comments

How to Turn an Acquaintance into a Friend

https://talk.bradwoods.io/blog/generous-with-disclosure/
1•bradwoodsio•21m ago•0 comments

Show HN: We built a free AI assistant that finds Amazon products instantly

https://www.sweepvalet.com/
1•felixthecat23•26m ago•0 comments

Ask HN: A Tetris variant with greater tactical and strategic depth?

1•amichail•28m ago•0 comments

Ask HN: Tacit knowledge video you've seen?

1•rahimnathwani•29m ago•0 comments

Researchers recreate ancient Egyptian blues

https://news.wsu.edu/press-release/2025/06/02/researchers-recreate-ancient-egyptian-blues/
1•gnabgib•33m ago•0 comments

Beware Not All Staff Positions Are Staff Roles

https://jkebertz.medium.com/beware-not-all-staff-positions-are-actually-staff-roles-ebcf60e0f3a1
2•mooreds•33m ago•0 comments

Beyond OCR: TIA-Pdf-QA-Bench

https://www.3rdaiautomation.com/
2•vivito•36m ago•1 comments

Sports betting seems to be spurring a rise in gambling addiction

https://www.theatlantic.com/health/archive/2025/06/sports-betting-gambling-addiction/683042/
2•JumpCrisscross•52m ago•0 comments

The AI Prompts Doge Used to "Munch" Contracts Related to Veterans' Health

https://www.propublica.org/article/inside-ai-tool-doge-veterans-affairs-contracts-sahil-lavingia
3•lwo32k•54m ago•1 comments

Trump thinks Americans consume too much. He has a point

https://www.economist.com/finance-and-economics/2025/06/05/trump-thinks-americans-consume-too-much-he-has-a-point
5•mastazi•1h ago•0 comments

Why Are Smokestacks So Tall?

https://practical.engineering/blog/2025/6/3/why-are-smokestacks-so-tall
2•azeemba•1h ago•0 comments

Sharing everything I could understand about gradient noise

https://blog.pkh.me/p/42-sharing-everything-i-could-understand-about-gradient-noise.html
2•signa11•1h ago•0 comments

Some CUDA code examples with READMEs

https://github.com/drkennetz/cuda_examples
5•tanelpoder•1h ago•0 comments

Some Thoughts on the C Standard

https://johnbreaksstuff.substack.com/p/some-thoughts-on-the-c-standard
2•stock1218•1h ago•0 comments

Stan Fischer

https://larrysummers.com/news-item/stan-fischer/
3•paulpauper•1h ago•0 comments

I podcast with Azeem Azhar on the speed of AI take-off

https://marginalrevolution.com/marginalrevolution/2025/06/i-podcast-with-azeem-azhar-on-the-speed-of-ai-take-off.html
2•paulpauper•1h ago•0 comments

Tesla Optimus robotics vice president Milan Kovac is leaving the company

https://www.cnbc.com/2025/06/06/tesla-optimus-robotics-vp-is-leaving-the-company.html
8•TheAlchemist•1h ago•2 comments

China's driverless lorries hope to expand

https://www.bbc.com/news/articles/c5ykel5dr62o
2•mastazi•1h ago•0 comments

Why You Should Move Your Site Away from Weebly (YC W07)

https://www.articulation.blog/p/why-you-should-move-your-site-away-from-weebly
5•dustywusty•2h ago•1 comments

Colorado kayakers rescue a dog that tumbled over 60-foot waterfall in Mexico

https://coloradosun.com/2025/06/03/kayakers-rescue-waterfall-trapped-dog/
4•mooreds•2h ago•0 comments

Portable device captures airborne molecules for noninvasive disease detection

https://phys.org/news/2025-05-portable-device-captures-airborne-molecules.html
1•PaulHoule•2h ago•0 comments

Why does C++ think my class is copy-constructible when it can't be?

https://devblogs.microsoft.com/oldnewthing/20250606-00/?p=111254
7•ibobev•2h ago•0 comments

Building a Modern Python API with Azure Cosmos DB: A 5-Part Video Series

https://devblogs.microsoft.com/cosmosdb/building-a-modern-python-api-with-azure-cosmos-db-a-5-part-video-series/
1•ibobev•2h ago•0 comments

Retro Game Sprites Generated in One Attempt with Ideogram's "V3 Quality" Model

https://gametorch.app/commons/image_models/ideogram-v3-quality
3•gametorch•2h ago•1 comments