Related tangent, "HPBN" (High-Performance Browser Networking) is a great book that includes related concepts.
[0](https://www.oreilly.com/library/view/designing-data-intensiv...)
I think SSE needs expansion with http 3.
Also found that AIMD is better than a circuit breaker in a lot of circumstances too.
Golang lib of the above https://github.com/webriots/rate
In the field of router buffer management, there are algorithms like Stochastic Fair Blue, which does the latter, but is somewhat hard to apply to HTTP because you’d have to define a success/failure metric for each request (a latency target, for example), and clients would have to tolerate a small probability a request being rejected even when they’re not being rate limited.
In Google’s paper on the Zanzibar authorization system, they give brief mention to rate limiting clients based on a CPU time allocation, but don’t go into any detail since it’s not a paper on rate limiting.
It’s something that matters less today with ubiquitous autoscaling, where the capacity of the system is whatever you need it to be to give each user what they ask for up to their rate limit, but I’m surprised at my inability to find any detailed account of such a thing being attempted.
Furthermore, modern GPU workloads are way less elastic in capacity scaling
https://aws.amazon.com/builders-library/workload-isolation-u...
Which isn't exactly what you're talking about, but between that and other things in the "Builder's Library" series, you can see that people are doing this, and writing about it.
Netflix has a blog post for their implementation: https://netflixtechblog.medium.com/performance-under-load-3e...
Pretty easy to pair AIMD with token bucket (eg https://github.com/webriots/rate)
Stochastic Fair Blue shares that problem, so I think you might find the solution it uses interesting: it rotates hashes on fixed intervals by re-seeding to ensure that if a responsive flow collides with a non-responsive flow that's being rate-limited, it is at least guaranteed not to do it for very long.
This leads to a problem similar to the problem with Fixed Window Counter rate limiting where it would basically forget the rate limit history every interval. To solve this, two queues are fed input data at a time: the router enforces one of them, while just feeding the other data and ignoring its output in order to warm up its buckets.
I imagine if I tried to make use of the library you linked and didn't absolutely need per-user granularity, I'd do something similar by concatenating the user's identifier with a seed value for each of the two queues and rotating at regular intervals.
For situations where eventual consistency is good enough, you can run a task in a loop that tries every n seconds to update a quantity. But as you say that can also saturate, so what you really want is for the task to update, then wait m seconds and go again, where m is more than the time you expect the task to complete in (<<50% duty cycle). As the cost of the operation climbs the time lag increases but the load on the system increases more slowly. If you want to, collect telemetry on how often it completes and set up alarms for if it doesn't for a duration that is several times longer than your spikiest loads happen.
I don't think voluntary rate limiting on the client side gets enough column inches. Peer to peer you end up footguning yourself if you bite off more than you can chew, and if you start stalling on responses then you gum up the server as well.
The site defaults to dark mode for me (I'm assuming based on my system preferences), but all the examples are unusable with dark mode. The examples all seem to be within fixed-width divs that cut off part of the content unless you scroll within the example.
- "Working" I wanted to keep things consistent.
- Content getting cut was a limitation of iframe. Most blogging platform don't allows you to embed another page. This was best I could do given the limitation.
- I do use AI to bounce ideas, but a lot of effort went into getting the apps working as intended.
Is it supposed to say, "How it works"?
> By following, you'll have instant access to our new posts in your feed.
> Continue with Google
> More options
As soon as I see this in a blog, I quit tab. Why do authors do this to themselves?
- Medium which paywalls the article and forces you to sign up just to read.
- Substack has same problem, it's great for funneling people to your paid newsletter but there is a sign up banner as soon the page loads.
- Build your own and miss out on the social aspect and there's no proof if the numbers are real.
Great visualization tools. Next time I have to explain it, I'll reach for these.
Instead of storing the current number of tokens, you instead store when the bucket will be full. If you take a token from the bucket, you increment the timestamp accordingly by 1/rps. The only complication is it the filled timestamp was in the past, you have to first update it with the current timestamp to avoid overfilling.
What's even nicer is that it doubles as a throttle implementation rather than just a rate limiter. You know the bucket is empty if you compute empty_at=filled_at-(max_tokens/rps) which is still in the future. From that calculation you now know when it will have capacity again, so you can sleep accordingly. If you use a queue before the gcra, it then starts sowing down new connections rather than just dropping them.
You should still have a limit on the queue, but it's nice in that it can gracefully turn from token bucket into leaky bucket.
Redis/valkey? Postgres? DynamoDB?
fside•8mo ago
eknkc•8mo ago
Sliding window might work good with large intervals. If you have something like a 24h window, fixed window will abruply cut things off for hours.
I mostly work with 1 minute windows so its fixed all the way.
mparnisari•8mo ago
hotpocket777•8mo ago
smadge•8mo ago