Simpler Backoff

https://commaok.xyz/post/simple-backoff/

123•todsacerdoti•1d ago

Comments

charcircuit•1d ago

Switching away from an implementation that everyone is familiar with is the opposite of making something more readable.

larodi•1d ago

indeed, back in the day i'd love to write perl code such as

this() and that() or otherr() and fail();

so this all seems very reasonable, and clear, and people who read the code hated it. I've tried many times to do it in JS sadly it does not work:

a && b || do { console.log (err) }

but this all does not make it easier to read, but makes it only very likely to outsmart yourself.

some code is algorithmically more beautiful than other, more declarative perhaps. but going away from the common understanding of trivial stuff, does not always benefit the group or the project.

spauldo•1d ago

It's down to what's idiomatic for the language. I'd argue that the Perl example is idiomatic Perl (just as the Lisp equivalent would be idiomatic Lisp), but that's certainly not idiomatic JavaScript.

ayhanfuat•1d ago

Aren’t both implementations fairly common? Precomputed / hardcoded set of values vs adjusting it inside the loop?

anonzzzies•1d ago

But how many people need to read/write this code? There are tons of open source libraries, gateways, proxies etc etc.

unsnap_biceps•1d ago

While I like the simplicity of the lookup table, I find myself in larger projects to have different backoffs for different upstream services. I'll have shorter ones for services that can degrade without a large impact to service results and longer ones for services that are critical for results. I would prefer to have one algorithmically generated call that is flexible vs multiple individual functions that have different lookup tables, or generating tables and passing them in each time.

fmbb•1d ago

You could have two lookup tables.

VWWHFSfQ•1d ago

what if you have 100 lookup tables

barbazoo•22h ago

Careful, that’s awfully close to N. /s

mceachen•22h ago

If you don’t have lookup tables for your lookup tables you aren’t doing it right

avinassh•22h ago

you could use SQLite, store all of them

kstrauser•21h ago

MongoDB is web scale.

harpiaharpyja•22h ago

Generating tables is the way to go in that case. The flexibility you need, with the advantages of being data driven.

kiitos•17h ago

Backoff is a client-side behavior, so if you want to give different backoff behaviors to different classes of upstream services, you just define those backoff tables somewhere where the client can pick from them based on the target service it's trying to talk to.

OutOfHere•1d ago

Using a lookup table doesn't allow parameter values to easily be changed. Also, some of us do not get paid by the line. Fwiw, I do often note the values in a comment, and this is generally sufficient.

hsbauauvhabzb•1d ago

Change of the sake of change?

axeljohnsson•1d ago

I recently listened to Marc Brooker's talk [1] on retries. My key takeaway: You don't want to put more load on your system during stress. At AWS, they do "adaptive retries", which I found interesting.

[1] https://www.youtube.com/watch?v=rvHd4Y76-fs

zoover2020•1d ago

I second this, retry mechanisms can cause retry storms in vast enough, distributed systems. In Amazon code bases I found the same adaptive retry strategy after a Christmas where once we played whack-a-mole for a service to get back up as its clients kept retrying.

jpillora•1d ago

Simple backoff ftw https://github.com/jpillora/backoff

ricardobeat•1d ago

That’s the exact same logic, just written more verbosely and excluding the actual retry mechanism.

blensor•1d ago

Is this meant as a joke or a serious argument?

I personally find the original code just as if not more readable than the static lookup table and I don't need to count out the elements if I want to know how often it will retry.

But more importantly, changing the max retries is trivial in the original code and tedious in the static lookup table, especially for bigger changes.

Also, this is something you most likely want to make configurable to adapt to certain scenarios which is not possible with a static table

There are more reasons against it but those are the the main ones that jump at me right away

masklinn•1d ago

Not only that, but now that Go has iterators if you need this in several locations the logic of the original can easily be encapsulated in an iterator with a few relevant tunable knobs, and then you just write something like:

  for range ExponentialBackoff(ctx) {
      err := request(ctx)
      if err == nil {
          return nil
      }
  }

and if one of the callsites needs to configure the backoff then you've got something like:

  for range ExponentialBackoff(ctx, MaxAttempts(20), BaseDelay(5)) {
      err := request(ctx)
      if err == nil {
          return nil
      }
  }

efitz•1d ago

Why a lookup table? Why not just have the client remember the last backoff, and multiply it by a constant and apply a ceiling?

def getNewBackoff( oldBackoff: int): int {

newBackoff = oldBackoff * BACKOFF_CONSTANT

if (newBackoff > BACKOFF_CEILING) { newBackoff = BACKOFF_CEILING }

return newBackoff }

if you bit align the ceiling value you can replace the conditional with a bitwise and mask. If you use 2 as your exponent you can replace the multiplication with a bit shift left.

It’s also a good idea to introduce jitter, e.g. don’t set the new backoff to the old backoff * a constant, but add a random value such that the old value gets randomized around the desired new value- this helps prevent pile-ons when a bunch of clients noticed about the same time that the server became unresponsive; if they all have the same backoff algorithm then they’re all going to hit the server again at about the same time. It’s good to spread that out to try to avoid server overloading during restarts/cold starts/whatever.

The best results I ever got were when I used server side hinting to avoid pile-ons when the retries from the client were all clustered around the same time.

When my server got in an overloaded state, the auth path for clients short circuited and delivered an overloaded message with a randomized suggested retry value. The server can’t make the clients honor it (we could in my case because we wrote the clients) but if the clients do honor it you can spread out reconnects over a period instead of make them all point-in-time and dampen or prevent oscillating overloads.

mariusor•1d ago

Similar to your suggestion, I implemented a wrapper around retries in Go before and despite liking the lookup table version quite a lot, I think it's more important to be able to parametrize the initial duration of the back-off and also the rate at which it grows, so that's what I focused on.

I'm not sure how my example fares against on a simplicity scale, but it feels a little more readable to me:

    Retry(5, BackOff(time.Second, Linear(1.4), func(_ context.Context) { // do stuff } )

Example cribbed from the documentation: https://pkg.go.dev/git.sr.ht/~mariusor/ssm#StrategyFn

0x696C6961•1d ago

Backoff APIs often take a retry count as the input. Your suggestion would require changing the API which isn't always practical.

gdiamos•23h ago

explicit is better than implicit

efitz•18h ago

A server cannot make a client behave a certain way. Explicit doesn’t matter in this case, and as many of the comments said, most of the time the best thing to do for the server is to spread retry load randomly over an interval. Explicit is bad for the server.

The only thing (IMO) explicit retries are good for is to make client state more deterministic on the client, but there are many ways of doing that other than hard coding retry intervals. For example, use asynchronous patterns for networking APIs and allow querying state of API-calls-in-progress.

akshayshah•20h ago

Full agreement from me that jitter is essential. Here’s AWS on good retry policies: https://aws.amazon.com/builders-library/timeouts-retries-and...

ctxc•1d ago

If the author is here - please turn word wrap off for code. It's unreadable on mobile.

jitl•1d ago

Code wraps for me in Safari on iOS 18.5

ChrisMarshallNY•1d ago

I do this, in a couple of my Watch apps.

Might be worth looking at. The current implementation is slightly awkward, but it also uses a random delay.

Watch/iPhone communication isn't very robust. It's much better, in more recent Watches.

bravesoul2•1d ago

Another approach is to make this a cross-cutting concern using a sidecar like Envoy.

jelder•1d ago

This really just an argument for using iterator generators. Does Go offer those yet?

jitl•1d ago

Yes

kassner•1d ago

Offtopic, but huge respect for this[1]:

> Lots of people don’t want to read about AI.

> I respect that.

> But I’m currently steeped in the world of AI, for better or for worse, and I want to blog about it. So I’ve split this blog in half.

> The normal blog, which you are reading, is now AI-free.

> There’s another, at /ai that contains only AI posts. It has its own RSS feed.

Thank you. It (non-AI) definitely goes to my RSS reader.

1: https://commaok.xyz/post/blog-schism/

rixed•1d ago

Am I the only one who wonders how such trivial posts could end up on HN first page?

qoez•1d ago

It's so easy to make bot accounts I wouldn't be surprised if a decent amount of them are manipulated.

Retr0id•1d ago

Something doesn't have to be long or complex to be interesting.

handsclean•23h ago

Here’s a version that’s IMO much simpler still, about half as long as both (after fixing v2’s line count cheating), and fixes a bug: both article versions wait an extra 60 seconds between the last attempt and returning an error.

    func do(ctx context.Context) error {
        for attempt := 1; ; attempt++ {
            if request(ctx) == nil {
                return nil
            }

            if attempt == 10 {
                return fmt.Errorf("failed after %d attempts", attempt)
            }

            select {
            case <-ctx.Done():
                return ctx.Err()
            case <-time.After(time.Second *
                min(60, math.Pow(2, attempt-1)*(0.75+rand.Float64()*0.5))):
            }
        }
    }

I think the problem with the original isn’t using a formula, it’s over-parameterization. Declaring constants right next to where they’re used once isn’t giving you anything but the opportunity for naming issues and rigidity in how you combine them. If they aren’t passed in or repeated, then I wouldn’t pull them out; if they are, then I’d try to look for minimal and natural boundary points, like just min and max, or maybe an attempt → delay function.

Waterluvian•23h ago

I sometimes agree, sometimes disagree. I don’t like magic numbers like retry(3, 1_000, 2_500)

(Though sometimes your text editor and docs are good enough to help, but I don’t like making those feel required).

I like how Python (and probably many other languages) allows for using the name of a positional arg. So you could call the exact same function but write it as retry(count=3, min_ms=1_000, max_ms=2_500) if you feel like it needs more clarity. Now you’ve got documented values that are tied to the function signature, their names have been decided for you, and they’re inline!

Mawr•18h ago

> Declaring constants right next to where they’re used once isn’t giving you anything but the opportunity for naming issues and rigidity in how you combine them. If they aren’t passed in or repeated, then I wouldn’t pull them out;

Using magic numbers all over the place is the mark of a novice who lacks understanding that the meaning behind the numbers he knows in his mind will not magically transfer over to other people reading the code. Advising this approach is an interesting call.

Specifically here, we've got these at the beginning of the function:

    const (
        maxAttempts = 10
        baseDelay   = 1 * time.Second
        maxDelay    = 60 * time.Second
    )

If I went out of my way to look at this function, these are probably the values I'm interested in changing, and look, they're right here, at the top. Neat. It's obvious how to adjust the values to get what I want. I can stop reading the function right there.

Without these declarations, how would I know that "10" midway through the function is the number of attempts? I'd have to read and understand the entire logic first. Great. Now multiply this effort by the amount of programmers on the team and the amount of magic numbers.

harpiaharpyja•22h ago

So, this is just about declarative programming style?

I thought based on the title it would have been about different approaches to backoff.

grg0•21h ago

Or a function backoff(attempt) that abstracts away the details and can be re-used in places, perhaps a higher-order function that takes as input the function that does whatever.

"This code, or something like it, probably looks really familiar:"

Better leave that company, to be honest.

yawaramin•20h ago

On a slightly different note, I am using Fibonacci backoff for a couple of things here and there and (subjectively) finding it a bit nicer than the standard 2x exponential backoff. I have it on a system that waits for a few days to raise an alert on missing data and I find the alerts on a cadence of 1, 2, 3, 5, 8... days are a bit tighter but still not overwhelming.

mukiblejlok•6h ago

For Python, I would recommend the stamina package. It provides all mentioned best practices about when and how to retry, in a form of easy to use function decorator. Plus, there is a great video about the problem by the author: https://youtu.be/BxikFuvaT1Y

The Perfect Commit (2022)

Show HN: LLM Loop Plugin

What No One Tells You About Early Retirement

Announcing PromptIDE (2023)

Largest Wildlife Crossing Is Almost Ready over Ten Lanes of Highway 101

0.999... (2001)

Elegant Configuration APIs and Command Line Flags

Jq Release 1.8.0

Gigabyte G383-R80-AAP1 AMD Instinct MI300A Server Review – ServeTheHome

Staring into the abyss as a core life skill (2022)

Cloudflare's Extortion – A Cautionary Tale

Trump pulls Isaacman nomination for space. Source: "NASA is f***ed"

Doom Will Teach You What CS Degrees Miss [video]

Advanced Orbital Constellations for Solar Storm Defense

Embedding Godot games in iOS apps is easy now

Structured 3D Latents for Scalable and Versatile 3D Generation

How I got a Root Shell on a Credit Card Terminal

'Russian bombers are burning en masse'

Creating a React and Flask Project in 2025

Imane Khelif, the IOC, World Boxing and mandatory sex testing

The other technology adoption curve

Computers of the Future

Why can't we put metal objects in a microwave?

Fast Character Classification with Z3

The Other Two

Boop Newsletter

Show HN: Tarzan – Turn any email into a live blog post (Go and Vue.js)

What Web Browsing Data Tells Us About How AI Appears Online

Easily Understand Rdma Technology

Living to Die Well