Stop Forwarding Errors, Start Designing Them

https://fast.github.io/blog/stop-forwarding-errors-start-designing-them/

100•andylokandy•1mo ago

Comments

bheadmaster•1mo ago

Many Rust programmers despise Go's "if err != nil" pattern, but that pattern actually forces you to think about errors and "design" them to give meaningful messages, either by wrapping them (if the underlying error is expected to provide userful information), or by creating a one from scratch.

It may be easier to just add the "?" operator everywhere (and we are lazy and will mostly do what is easier), but it often leads to problem explained in the article.

jayknight•1mo ago

>that pattern actually forces you to think about errors and "design" them to give meaningful messages

Doesn't Rust's Result type(s) force you to do the same? Sure, you can pass them on with the ? operator, but it's still a choice you have to make.

alembic_fumes•1mo ago

Hard disagree. Most of the Go code that I've ever worked with has been littered with one or another variant of the following:

  value, err := doFallibleOperation()
  if err != nil {
    return nil, fmt.Errorf("fallible operation failed - %w", err)
  }

That error construct exclusively works for the poor human who has to debug the system, looking at its logs. No call stacks and, crucially, no automatic handling.

At least with Rust's enums it is possible to make errors automatically actionable. If one skips that part and opts for anyhow because it's too much work, that's really a user problem.

I like the author's idea of "designing" errors by exposing their actionability in the interface a lot. I'm not overall sold on whether that should be the primary categorization, but at least including a docstring to each enum variant about what can be done about the matter sounds like a nice way to improve most code a little bit.

Fizzadar•1mo ago

As a primarily Go dev - 100% agree. The endless check and wrap error results in long chains of messages you have to grep for to understand the call stack. For what benefit? Might as well just panic and recover/log the stack in many cases.

formerly_proven•1mo ago

Artisanal callstacks

morshu9001•1mo ago

The error handling is by far my least favorite aspect of Go. It's tedious and dangerous. It should either be like Rust or like JS, there isn't a good third option.

tcfhgj•1mo ago

what about checked exceptions (Java)?

morshu9001•1mo ago

Isn't JS the same? But seems like people tend to make a lot of exception types in Java with inheritance, which I think is overkill.

Typically I'll only have a couple of exception types that my own code throws, like user error vs system error. If I want more detail than that, it goes into the exception payload rather than defining many different types of exceptions.

bheadmaster•1mo ago

> If one skips that part and opts for anyhow because it's too much work, that's really a user problem.

If a language makes this more convenient than doing it right, one could argue that the language design is at fault.

Thaxll•1mo ago

In many code base you have custom errors that implement the error interface ( for http code and the like ), it's very common.

akdor1154•1mo ago

I think that was the intent of Go's design, but in practise i think it normally devolves into an overly verbose '?' with a poorly typed Result<_, String>.

As a Go dev, I'm looking at this article with great interest. I would very much like to apply this approach to Go as well, I think the author has got a very strong design there.

tison•1mo ago

FWIW, here is a general discussion about error handling in Rust and my comment to compare it with Go's/Java's flavor: https://github.com/apache/datasketches-rust/issues/27#issuec...

That said, I can live with "if err != nil", but every type has a zero value is quite a headache to handle: you would fight with nil, typed nil, and zero value.

For example, you need something like:

  type NullString struct {
   String string
   Valid  bool // Valid is true if String is not NULL
  }

.. to handle a nullable value while `Valid = false && String = something` is by defined invalid but .. quite hard to explain. (Go has no sum type in this aspect)

vaylian•1mo ago

I've been thinking about Rust errors as well. We see all these nice tutorials that explain how you can match on an Err and then handle it. But I haven't seen this being done in practise. Most errors are reported directly to the user. There don't seem to be any attempts to automatically handle them.

The cause for an error can be upstream or downstream. If a function fails, because the network is down, then this is a downstream error. The user has not done anything wrong (unless they also are responsible for the network infrastructure). In that case a retry after a few moments might be the right approach. However, if the user provides bad function arguments, then the user needs to be informed, that it's them who need to make corrections. However, it is not always clear if that is the case. If a user requests a non-existing file, then there might be different reasons why the file does not exist (yet).

rileymat2•1mo ago

I am a bit confused by the network example, even when I don't control the network at the moment I need to do something about it and know about it to act.

vaylian•1mo ago

The software needs to report back to the end user eventually. But if there is a temporary network failure, then the software should automatically retry the request without informing the user (assuming idempotency).

fozem•1mo ago

Good overview on Rust error handling.

I like errors that are unique and trivially greppable in a codebase. They should be stack efficient and word sized. Maybe a new calling convention where a register is reserved for error code and another register is a pointer to the source location string that is stored in a data segment.

The FP fanboy side of me likes the idea of algebraic effects and ADTs but not at the expense of stack efficiency.

EPWN3D•1mo ago

You basically want a modern errno. I don't mean that as a dig at you -- I've found POSIX error codes to still be the best way to design errors in C. If it can't be evaluated by switch, then it's too complicated.

Rygian•1mo ago

Sorry for the small digression. It's on topic.

Just a few minutes ago, while copying 63 GB worth of pics and videos from my phone to my laptop, KDE forwarded me the error "File <hard to retain name.jpg> could not be opened. Retry, Ignore, Ignore all, Cancel".

This was around file 7000 out of 15000. The file transfer stopped until I made a choice.

As a user, what am I supposed to do with such a popup?

It seems like a very good example of "Eror Handling Without Purpose" as the article describes, but at user level.

Except that here, the audience is "a plain user who just dragged a folder to make a copy" and none of the four options (or even the act of stopping the file transfer until an answer is chosen) is actually meaningful for the user.

The "Putting It Together" for this scenario should look like: a non-modal section populates with "file <hard to retain name.jpg> failed due to reason; at the end of the file transfer you'll get a list with all the files that failed, and you'll have an option to retry them, navigate to their source position to double-check, and/or ignore".

XorNot•1mo ago

This design still doesn't work: what if the user walks away and the computer is powered off in the meantime?

I.e. you need to write the report of this to a file itself. In fact you should allocate a decently large file upfront to make sure you can write the report and the error message (out of disk space for example).

throw-the-towel•1mo ago

And what if the computer is kidnapped by the US Army while it's copying the files?

You just can't defend against everything, but an imperfect solution can still be an improvement over the status quo.

XorNot•1mo ago

No, but imagine doing all the work to collect up a list of files that failed only to say, pop a modal at the end of the process that coincides with the user hitting Enter because they were multitasking and it auto-accepts the dialog. Information gone, context lost, in fact your entire design has failed to change the experience at all! All because of one UI overlap that's actually very common.

We have shared workstations for example where this would be a typical use case for non-tecchnical users across multiple user logins: ensuring you can check that the big data transfer was complete a few hours later would be very useful, but if you only do a fraction of the work for completeness then again, it's of no benefit.

marcosdumay•1mo ago

Yes. The entire reason DEs expect people to dismiss those dialogs is because they are modal. And there's no reason at all for them to be modal.

KDE even got an entire notifications application, and discovered that it's bad to make them modal. But didn't move away from the idea of dismissing them on any interaction, it still acts like it's a modal.

vineyardmike•1mo ago

> kidnapped by the US Army.... You just can't defend against everything

Of course not.

The litmus test IMO should be "what would a normal intelligent human do in this situation?"

A human would copy every file it could, maintaining a list of issues. When you were available to address concerns, it'd present the options to you. The human would give up if the US Army showed up, but a human would restart a TCP connection automatically without asking for permission again (or more analogously, redial a phone call). A human would save their work automatically, and when you showed back up, would find that work for you.

(In 2026, things like "retry" should be automatic outside some very specific limitations too, because of course a human would try again if they failed).

yetihehe•1mo ago

> what would a normal intelligent human do in this situation?

Problem is that this requires testing what actual "normal intelligent human" would do, because very often programmer has other ideas and UI/UX people have other ideas.

> A human would copy every file it could, maintaining a list of issues.

How do you know? From your idea what should be done instead of current version? I would not do it like you said.

Also, there are many reasons for transfer not succeeding and depending on a reason why transfer didn't succeed, you should make different decisions. sometimes reasons are not predictable by a program (a new file transfer method over pidgeons was transparently added to the system and "carrier attacked by predator" was not included in "how to handle this reason").

1718627440•1mo ago

> A human would copy every file it could, maintaining a list of issues.

Please not, I want my computer to be a dumb tool, who really only does what I told it to. I do not want to have it have it's own agenda.

> In 2026, things like "retry" should be automatic outside some very specific limitations too

No. I can tell the computer to retry, when I didn't it is because I didn't want it to.

Rygian•1mo ago

It goes quite far, actually.

A file transfer should remain active even if both devices (source, destination) are physically disconnected, or in network partitions, or when devices are full, need media change, etc.

The only valid states for a file transfer are: ongoing, fully completed with 100% success, or explicitly cancelled by the user with a full usable report of what got copied, fully or partially, and what did not get copied.

The file transfer dialogs and tooling of today's mainstream computing are stuck in the nineties.

yetihehe•1mo ago

Then you will have another control panel or log of ongoing file transfers, which will accumulate waiting transfers over the years a device was used.

grumbel•1mo ago

> As a user, what am I supposed to do with such a popup?

Change the floppy disk. In the MSDOS days those messages were useful, as read errors might be caused by having the wrong floppy in the drive. The OS had no way to know when the floppy was changed and "Retry" allowed you to swap the disks back and try again. In modern days it is less useful, the behavior just got carried over.

Windows addresses this issue somewhat by scanning the directory tree before the actual copying starts, this can catch some errors before they happen and gives you better progress reporting on top.

But a single dialog that keeps track of the whole copy/move operations, not a modal dialog attached to individual read/write calls would be the way to go here. This is a case of the GUI sticking to close to what the OS is doing instead of what the user intended to do.

1718627440•1mo ago

> Windows addresses this issue somewhat by scanning the directory tree before the actual copying starts

Which really sucks because no you need to wait for minutes before it actually starts moving or deleting. I generally just abort, start the midnight commander or just invoke mv/del directly.

> But a single dialog that keeps track of the whole copy/move operations

Which is what is the case here? The question and buttons appear in that dialog.

grumbel•1mo ago

> The question and buttons appear in that dialog.

The error/retry dialog is for the failure of moving an individual file, not for a failure of the move operation as a whole. Those individual error dialogs provide no means to deal with cascading errors. All you can do is "Skip All", but that means you get no further information on errors anymore.

The error reporting should be part of the Moving dialog itself and provide a list of everything that failed in the move, along with potential ways to resolve it. More detailed reporting than "Could not read" would also be welcome (io, permission, ...).

Sytten•1mo ago

Exn looks very interesting, but to be actionable we need a compatibility layer with thiserror and anyhow since most are using it right now. Moving the goalpost a little we mostly need a core rust solution otherwise your error handling stops at the first library you use that doesn't use exn.

tison•1mo ago

I think they are almost compatible.

`thiserror` helps you define the error type. That error type can then be used with `anyhow` or `exn`. Actually, we have been using thiserror + exn for a long time, and it works well. While later we realize that `struct ModuleError(String)` can easily implement Error without thiserror, we remove thiserror dependency for conciseness.

`exn` can use `anyhow::Error` as its inner Error. However, one may use `Exn::as_error` to retrieve the outermost error layer to populate anyhow.

I ever consider `impl std::error::Error` for `exn::Exn,` but it would lose some information, especially if the error has multiple children.

`error-stack` did that at the cost of no more source:

* https://docs.rs/error-stack/0.6.0/src/error_stack/report.rs....

* https://docs.rs/error-stack/0.6.0/src/error_stack/error.rs.h...

dvogel•1mo ago

> But as a standard library abstraction, it’s too opinionated. It categorically excludes cases where sources form a tree: a validation error with multiple field failures, a timeout with partial results. These scenarios exist, and the standard trait offers no way to represent them.

This seems akin to complaining that the CPU core has only one instruction pointer. There is nothing preventing a struct implementing `Error` from aggregating other errors (such as validation results) and still exposing them via the `Error` trait. The fact of the matter is that the call stack is linear, so the interior node in the tree the author wants still needs to provide the aggregate error reporting that reflects the call stack that was lost with the various returns. Nothing about that error type implementing `Error` prevents it from also implementing another error reporting trait that reflects the aggregate errors in all of the underlying richness with which they were collected.

oncallthrow•1mo ago

This is interestingly somewhere where Go really shines, in my experience. Go has no requirement to wrap (or, indeed, even handle at all) errors; yet, despite this, Go codebases I've worked in almost always perform error handling properly (wrapping at each layer of the call stack, so it's easy to identify where an error occurred).

morshu9001•1mo ago

I'd rather have exceptions so this is done for you. Not really an option in Rust due to overhead ofc.

jiehong•1mo ago

For the flat structure part, it’s much less shiny, though.

Weirdly, the last time I saw an error in production I couldn’t investigate was because of a go service with no error wrapping… funny coincidence

hu3•1mo ago

It's about incentives. Go makes it explicit.

And because it's standardised, it's easy to create tooling to flag mishandled errors.

spion•1mo ago

I don't think there is anything in Go (the language) that helps achieve this - its mostly cultural. (Go creators and community being very outspoken about handling errors).

In fact, the easiest thing to do in Go is to ignore the error; the next easiest is to early-return the same error with no additional context.

Technically speaking, Rust has way better tools for adding context to errors. See for example https://docs.rs/color-eyre/latest/color_eyre/

It does expect you to use `wrap_err` to get the benefits, though. Which is easier to do than what Go requires you to do for good contextual errors, and even easier if you want reasonable-looking formatting from the Go version.

spion•1mo ago

IMO you need both things: culture to make it happen, and technology to make it easy and reasonable looking. Rust lacks the former to some degree; Go lacks the later to some degree (see e.g. kustomize error formatting - everything ends up on a single line)

Thaxll•1mo ago

Looks very similar to what Upspin ( Go ) errors look like:

https://github.com/upspin/upspin/blob/master/errors/errors.g...

    type Error struct {
        // Path is the Upspin path name of the item being accessed.
        Path upspin.PathName
        // User is the Upspin name of the user attempting the operation.
        User upspin.UserName
        // Op is the operation being performed, usually the name of the method
        // being invoked (Get, Put, etc.). It should not contain an at sign @.
        Op Op
        // Kind is the class of error, such as permission failure,
        // or "Other" if its class is unknown or irrelevant.
        Kind Kind
        // The underlying error that triggered this one, if any.
        Err error

        // Stack information; used only when the 'debug' build tag is set.
        stack
    }

croemer•1mo ago

Be warned: LLM writing. Lots of negative parallelisms.

amelius•1mo ago

Speaking of which, why aren't the LLMs solving these low level plumbing problems for us yet?

croemer•1mo ago

Because LLMs mostly follow historical practice. And examples for bad error handling are more common (and easier) than good error handling.

amelius•1mo ago

I'm pretty sure an LLM will be able to handle an instruction such as:

"Wherever exceptions are thrown, add as much contextual information to the exceptions as possible. Use class RichException<Exception> to store the extra information". Etc. etc.

croemer•1mo ago

Sure, but writing and maintaining such instructions is also work. And not something one thinks about usually until the debugging session with insufficient errors.

Lvl999Noob•1mo ago

Yeah. Certainly felt like that. On the other hand, the content does seem good. It definitely wasn't slop, even if I can't judge how useful it really was (in terms of giving a solution).

alienbaby•1mo ago

What is it you are actually warning me of?

croemer•1mo ago

That it is mostly LLM words which some of us here don't really like to read as it can be low entropy in language, structure, ideas.

tison•1mo ago

This is the pull request of this post: https://github.com/fast/fast.github.io/pull/12

See comments like https://github.com/fast/fast.github.io/pull/12#discussion_r2...

Quote my comment in the other thread:

> That said, exn benefits something from anyhow: https://github.com/fast/exn/pull/18, and we feed back our practices to error-stack where we come from: https://github.com/hashintel/hash/issues/667#issuecomment-33...

> While I have my opinions on existing crates, I believe we can share experiences and finally converge on a common good solution, no matter who made it.

nchagnet•1mo ago

I really like the pattern presented in the article. I find myself guilty of designing errors which are useful to me, but maybe not to my user (which tbh in my area is always a bit of a nebulous entity). I really like the idea of separating those two intents, and to make explicit the possible action.

jiehong•1mo ago

I suppose Java exceptions have the same issues, albeit with automatic stack traces, obviously:

- the ? keyword is replaced either by runtime exceptions and so each function do it transpires you don’t catch it, or by simply stating the raised exception in the signature

- message can be overloaded for humans

- the exception type itself is the structured data, but in practice it seldom contains structured data and most logic depends on the exception type.

Make of this what you will, but I didn’t say it’s great.

imtringued•1mo ago

Java has nested exceptions, which significantly reduces the problem, since there is going to be at least one relevant exception that will help you figure it out. In the worst case you can just paste the stack trace into your GitHub issue and call it a day.

With Rust, having a generic error bubble up without nesting means you don't even know where it went wrong. The error could be from any generic error source.

bccdee•1mo ago

I'm not sure I like how they're trying to dynamically cast to an error type.

  Err(report) => {
      // For machines: find and handle the structured error
      if let Some(err) = find_error::<StorageError>(&report) {
          if err.status == ErrorStatus::Temporary {
               return queue_for_retry(report);
          }
          return Err(map_to_http_status(err.kind));
      }

They get it right elsewhere when they describe errors for machines as being "flat and actionable." `StorageError` is that, but the outer `Err(report)` is not. You shouldn't be guessing which types of error you might run into; you should be exhaustively enumerating them.

I'd rather have something like this:

  struct Exn<T> {
      trace: Trace,
      err: T,
  }
  
  impl<T> Exn<T> {
      #[track_caller]
      fn wrap<U: From<T>>(self, msg: String) -> Exn<U> {
          Exn {
              trace: self.trace.add_context(Location::caller(), msg),
              err: self.err.into(),
          }
      }
  }

That way your `err` field is always a structured error, but you still get a context trace. With a bit more tweaking, you can make the trace tree-shaped rather than linear, too, if you want.

I think actionable error types need to be exhaustively matchable, at least for any Rust error that you expect a machine to be handling. Details a human is interested in can be preserved at each layer by the trace, while details the machine cares about will be pruned and reinterpreted at every layer, so the machine-readable info is kept flat, relevant, and matchable.

andylokandy•1mo ago

`Exn<T>` preserves the outmost error type and `Exn::<T>::as_error()` will give you the error just the way you want.

Traversing though the error tree is the worst case where the structured error has been bubbled up through layers until the one who are able to recover from it.

bccdee•1mo ago

That worst case shouldn't happen. When you do something like this:

  let data = serialize(&doc)
      .or_raise(|| StorageError::permanent("serialization failed"))?;

you're burying the error thrown by `serialize(&doc)` in such a way that you have to dig for it dynamically to recover it.

The closure in `or_raise` should take the error from `serialize(&doc)` as an argument and save the actionable details. It makes sense to have an "outermost error" when you're talking about a context trace that provides information to humans, but an error which a machine responds to should be flat & statically matchable.

Something like this:

  let data = serialize(&doc)
      .or_raise(|e| Exn::with_message(
          "serialization failed",
          StorageError::fromInner(e)))?;

where `StorageError::fromInner` decides whether the error should be permanent or not based on the contents of `e`, and saves any details that would be relevant for automatically recovering from the error.

larusso•1mo ago

Error handling in rust is the number one frustration. I rewrote my errors multiple time. I used error_chain which looked good on paper but was just as broken as thiserror and anyhow. The missing piece is already the fact that no one really defines how to write good and meaningful error types for the different audiences. Even the article described some cases that are highly implementation specific. I will take a look at this other crate the author showed though. The thiserror crate makes it too easy to just foreward errors with the #from / #source implementations. I played around with a helper crate that tries to add a context method to each generated error types. But this as well is optional and also adds tons of overhead.

spion•1mo ago

Great article. Really advances the thinking on error handling. Rust already has a head start compared to most other languages with Result, expect and anyhow (well, color_eyre and tracing), but there was indeed a missing piece tying together error handling "actionability" with "better than stack trace" context for the programmer.

With regards to context for the programmer, I still think ultimately tracing and color_eyre (see https://docs.rs/color-eyre/latest/color_eyre/) form a good-enough pair for service style applications, with tracing providing the missing additional context. But its nice to see a simpler approach to actionability.

atrooo•1mo ago

As good as the argument is, and the crate may be, I feel like I’ve been lied to when I realize I’m reading an AI generated blog post as is obvious by the end of this one.

yxhuvud•1mo ago

Unreadable due to lag when scrolling. How do you even manage that? Stutters happen on other pages but this was just a delay that was extremely annoying.

8note•1mo ago

on those notes, i dont think those errors are categorized by caller action, those are internal state.

NotFound should instead have an instruction "create this object first using that SOP" or "stop the transaction from going through"

Ratelimited has an instruction "try again in x ms" or "raise your rate limit following this SOP"

PermissionDenied has an instruction "request permissions here" or "complete this oauth"

as far as the flat error definition i think that rather than simple, its easy. its simpler to have each module define its own errors and have dedicated translation code to the libraries errors, rather than putting the translation and equivalencies between different modules errors within the library in the programmers head and code comments on the big error definition file.

Tiny C Compiler

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

The F Word

Show HN: A luma dependent chroma compression algorithm (image compression)

Start all of your commands with a comma (2009)

IBM Beam Spring: The Ultimate Retro Keyboard

Eigen: Building a Workspace

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

The AI boom is causing shortages everywhere else

Selection rather than prediction

I write games in C (yes, C) (2016)

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Learning from context is harder than we thought

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

Tiny C Compiler

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

The F Word

Show HN: A luma dependent chroma compression algorithm (image compression)

Start all of your commands with a comma (2009)

IBM Beam Spring: The Ultimate Retro Keyboard

Eigen: Building a Workspace

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

The AI boom is causing shortages everywhere else

Selection rather than prediction

I write games in C (yes, C) (2016)

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Learning from context is harder than we thought

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

Stop Forwarding Errors, Start Designing Them

Comments