Too Much Go Misdirection

https://flak.tedunangst.com/post/too-much-go-misdirection

186•todsacerdoti•2mo ago

Comments

jchw•2mo ago

The biggest issue here IMO is the interaction between two things:

- "Upcasting" either to a concrete type or to an interface that implements a specific additional function; e.g. in this case Bytes() would probably be useful

- Wrapper types, like bufio.Reader, that wrap an underlying type.

In isolation, either practice works great and I think they're nice ideas. However, over and over, they're proving to work together poorly. A wrapper type can't easily forward the type it is wrapping for the sake of accessing upcasts, and even if it did, depending on the type of wrapper it might be bad to expose the underlying type, so it has to be done carefully.

So instead this winds up needing to be handled basically for each type hierarchy that needs it, leading to awkward constructions like the Unwrap function for error types (which is very effective but weirder than it sounds, especially because there are two Unwraps) and the ResponseController for ResponseWriter wrappers.

Seems like the language or standard library needs a way to express this situation so that a wrapper can choose to be opaque or transparent and there can be an idiomatic way of exposing this.

movpasd•2mo ago

I'm not sure I fully understand the issue as I don't know Go, but is this something that a language-level delegation feature could help with?

hello_computer•2mo ago

It is the same struggle you can find in any language with private/public props. The stream he wants to read from is actually just a buffer that has been wrapped as a stream, and he’s having a hard time directly accessing the buffer through its wrapper. He could stream it into a new temporary buffer, but he’s trying to avoid that since it’s wasteful. I’ve had the same problem in C++.

XorNot•2mo ago

But the other side of this is that there's a contract violation going on: []byte can be mutated, but io.Reader cannot.

When I pass io.Reader I don't expect anything underneath it to be changed except the position. When I pass []byte it might be mutated.

So really solving the requires a whole new type - []constbyte or something (in general Go really needs stronger immutability guarantees - I've taken to putting Copy methods on all my struts just so I can get guaranteed independent copies, but I have to do it manually).

hello_computer•2mo ago

there is also the specific vs general trade-off: the general (io.Reader) being more flexible, while providing fewer opportunities for optimization. vice versa with the specific—be it []byte, or even []constbyte. i think it is just an inherent struggle with all abstractions.

jchw•1mo ago

> But the other side of this is that there's a contract violation going on: []byte can be mutated, but io.Reader cannot.

> When I pass io.Reader I don't expect anything underneath it to be changed except the position. When I pass []byte it might be mutated.

Where is it written in the contract that the underlying data source of io.Reader is immutable? I do not believe that is true. Even when io.Reader is backed by os.File, the OS may allow the file to be modified while you are reading it, possibly by the same process, using the io.Writer interface.

So while it's true that you certainly wouldn't expect an io.Reader itself to be mutable, an object that implements io.Reader certainly can expose interfaces that allow the underlying data source to be mutated. After all, bytes.Buffer is explicitly one such io.Reader; it has a Bytes() method:

> "The slice aliases the buffer content at least until the next buffer modification, so immediate changes to the slice will affect the result of future reads."

It's documented and totally allowed. Doesn't mean you should do it, but if an io.Reader that's backed by a buffer wants to expose its underlying buffer it isn't an issue of breaking contracts at the very least.

Another alternative to exposing an interface that returns []byte is getting clever with interfaces. Go sometimes will try to turn an io.Reader into an io.WriterTo: this can also avoid unnecessary copies.

> So really solving the requires a whole new type - []constbyte or something (in general Go really needs stronger immutability guarantees - I've taken to putting Copy methods on all my struts just so I can get guaranteed independent copies, but I have to do it manually).

A concept of constness would be nice, but it's trickier than it seems. An immutable type like string is fairly straight-forward. Immutable constant values of primitive types, also straightforward. Of course, Go has both. Constness as in const references to mutable values is weirder, because "const" sounds like it means something is immutable, since it does mean that in many contexts, but actually what it really means is that you can't mutate it, it still might get mutated behind your back, depending on if the actual value is const or not. I think that's at least a little unfortunate.

What I think I want is multiple concepts:

1. Variables that can't be re-assigned by default, akin to JavaScript `const` declarations.

2. Immutable const values of composite types, expanding the const primitives Go has today.

3. References that are immutable by default, mirroring Rust. Ideally you could take an immutable reference of constants that are typed, including composite const values, pushing them into rodata like you'd expect. To me this is better than C/C++'s concept of constness because it feels more intuitive when mutability is the exception; Having references marked const explicitly makes it look too similar to defining constants, making it easy to confuse immutability of the values to simply having an immutable reference to a mutable value. Having a `mut` case instead makes it clearer.

Can this be done in a theoretical Go-like programming language? Maybe... Will any of it ever be done? Probably not. The most feasible is probably 2., but even that would be pretty awkward due to the lack of immutable references; you'd have to always take a copy to actually use the value.

XorNot•1mo ago

If the underlying data source changes then that is something happening on my side of the contract.

But io.Reader explicitly deals with populating a caller supplied buffer with whatever the next valid set of bytes is.

The caller thus knows the buffer won't change while the handle it.

jchw•1mo ago

Not 100% sure what is meant here. io.Reader itself only exposes the ability to do read-only access, but if you upcast you are not limited to only io.Reader's functionality. Now of course if you had a function that upcasted e.g. io.Reader to io.Writer and wrote to the provided stream, that would be weird. On the other hand, there is no such issue with merely grabbing an interface { Bytes() []byte } out of an io.Reader, calling the Bytes() method, and then not modifying the returned buffer. I don't think that introduces any hazards or contract violations and as best as I can tell is idiomatic Go and not discouraged.

msteffen•2mo ago

> The bytes.Reader should really implement Peek. I’m pretty sure the reason it doesn’t is because this is the only way of creating read only views of slices. And a naughty user could peek at the bytes and then modify them. Sigh. People hate const poisoning, but I hate this more.

When I was a Google, a team adjacent to ours was onboarding a new client with performance demands that they could not realistically meet with anything resembling their current hardware footprint. Their service was a stateless Java service, so they elected to rewrite in C++. Now, Java has some overhead because of garbage collection and the JVM, and they hoped that this might move the needle, but what happened was they went from 300qps/core to 1200, with lower tail latency. Literally 3x improvement.

Why? Probably a lot of reasons, but the general consensus was: Java has no const, so many of Google’s internal libraries make defensive copies in many places, to guarantee immutability (which is valuable in a highly concurrent service, which everything there is). This generates a huge amount of garbage that, in theory, is short-lived, rarely escapes its GC generation, and can all be cleaned up after the request is finished. But their experience was that it’s just much faster to not copy and delete things all over the place. Which you can often avoid by using const effectively. I came to believe that this was Java’s biggest performance bottleneck, and when I saw that Go had GC with no const, I figured it would have the exact same problem

hinkley•2mo ago

Java has a little const. Strings are immutable. You can make objects with no mutations, so you can make read only collections fairly easily which is usually where const becomes a problem.

But then you have for instance Elixir, where all functions are pure, so mutating inputs to outputs takes a ton of copying, and any data structure that is not a DAG is a gigantic pain in the ass to modify. I lost count of how many tries it took me to implement Norvig’s sudoku solver. I kept having to go back and redesign my data structures every time I added more of the logic.

[edit to add]: DTOs exist in Java because some jackass used the term “Value Object” to include mutation despite large swaths of the CS world considering VOs to be intrinsically const. So then they had to make up a new term that meant Value Object without using the concept they already broke.

throwawaymaths•2mo ago

There are so many purity escape hatches in elixir!!

hinkley•2mo ago

The only ones I know about are the ones in functions and closures, where SSA basically just creates a new variable that hides the original (until the end of the block at which point you discover the reassignment didn't stick). What did you have in mind?

throwawaymaths•2mo ago

ets tables are the goto for data structures (for example the :digraph module that ships with elixir is built on an ets table, presumably because A* needs a mutable datatype)

Process dictionary is also an option.

Can always use a genserver as a data store (but only do that if lifetime shenanigans make it make sense)

Postgres is also an option. Sqlite if you don't want to stand up a service.

hinkley•1mo ago

Ah yes. I’m using a genserver to persist a countdown clock across page loads.

ETS gives me a memory leak vibe but I need to bite the bullet and learn to use it properly.

throwawaymaths•1mo ago

Think of ets as an arena. You can clear the whole table by yanking its owning process. This could even be made automatic by linking it to whatever process needs the ets table.

hinkley•1mo ago

Hmmm. Maybe I’m misremembering some Erlang release notes then.

hnlmorg•2mo ago

Are you able to explain the problem a little more because “const” does exist as a keyword, so I assume it’s doing something different to what you’re referring to with regards to C++ constants. Is Go not substituting constants like a macro? Or are we discussing something entirely different and I’m misunderstanding the context here?

kentm•2mo ago

I’m assuming they mean const function/method parameters. Being able to mark inputs to functions as const to guarantee that they aren’t mutated in C++ which often means you can just pass in the value by reference safely.

jerf•2mo ago

Java, Go, and C++ all have enough differences here that at this level of detail you shouldn't assume any other conversion will have exactly the same result that msteffen lays out. Java generally has more sophisticated compilation and a lot more engineering effort poured into it, but Go often ends up with less indirection in the first place and started life with what Java calls records [1] so they are more pervasive throughout the ecosystem. Which effect "wins" can be difficult to guess in advance without either a deep analysis of the code, or just trying it.

What msteffen talks about is a general principle that you can expect even small differences between languages to sometimes have significant impact on code.

I think this is also one of the reasons Rust libraries tend to come out so fast. They're very good at not copying things, but doing it safely without having to make "just in case" copies. It's hard to ever see a benchmark in which this particular effect makes Rust come out faster than any other language, because in the natural process of optimizing any particular benchmark for a non-Rust language, the benchmark will naturally not involve taking random copies "just in case", but this can have a significant impact on all that real code out in the real world not written for benchmarking. Casually written Rust can safely not make lots of copies, casually written code in almost anything else will probably have a lot more copying than the programmer realizes.

[1]: https://blogs.oracle.com/javamagazine/post/records-come-to-j...

hnlmorg•2mo ago

Ahhh right that makes a lot more sense.

Thanks for the explanation

wolfspaw•1mo ago

great comment! And you're right, one of the biggest Strengths of Rust is Move done right. With const, borrow, move, ref rules... the default way usually does not make unnecessary copies.

And when you want to make a copy, to escape lifetime annoyances for example, you do a .clone() that very explicit marks a point of a Copy.

masklinn•2mo ago

They mean const in the sense of readonly guarantees.

In java types are generally shared and mutable so let's say you want a list input, you generally don't store it as is because the caller could modify it at any point, so if you accept a `List`, you defensively copy it into an inner type for safety, which has a cost (even more so if you also need to defensively copy the list contents).

And likewise on output otherwise the caller could downcast and modify (in that specific case you could wrap it in an unmodifiableList, but not all types have an unmodifiable view available).

msteffen•1mo ago

Sorry for the slow reply here, but in Java, “final” variables can’t be updated to reference a different object, but the underlying object can be modified with method calls (e.g. you can append things to a “final” ArrayList, whereas in C++ you can’t append to a const vector. In C++, you mark methods const to indicate that they don’t mutate the underlying object, and only cost methods can be called on const variables).

The Guava docs give an example of the defensive copying: https://github.com/google/guava/wiki/ImmutableCollectionsExp.... Any time an Immutable collection is created from another collection, the elements are deep-copied to avoid the possibility that they’re modified elsewhere, even if there’s no possibility of that happening. Even if you’re using e.g. an ImmutableList.Builder, the elements are copied when they’re Add()ed to the Builder, so that they’re the same when you call build(). I’m not sure but I think even some mutable collections make defensive copies of their arguments. I believe that Josh Bloch (who was in charge of Java at Google when I joined) also advocates for this in his book.

On writing this, it occurs to me that some kind of move constructor into these mutable collections might also do the job? Though you’d have to make sure there were no mutating references outside the mutable collection as well, which might require some kind of ownership model, so you might just wind up with Rust.

Finally, this is dumb but it’s killing me: I meant to type 400qps to 1200qps/core.

cogman10•2mo ago

> so many of Google’s internal libraries make defensive copies in many places,

This, IMO, is a sign of poor design.

What are you trying to protect? That the google library isn't modifying something or that the caller of the google library isn't concurrently modifying something?

Or are your storing off the value for later use?

In any case, it's acceptable in the Javadoc and API to specify "If you give me this, you cannot further modify it". This already happens and is expected in common JDK data-structures. For example, if you put an element into a HashSet and then change the hash, you won't be able to find it again in the HashSet. Nobody complains that's the case because it's a "Well duh, you shouldn't have done that". Similarly, if you mutate a map while accessing it you'll get a "ConcurrentModificationException" or even bad results. Again, completely expected behavior.

If you are worried about your code doing the wrong thing with something, then one defense that is easy to deploy is wrapping that object with one the is unmodifiable. That's why the JDK has the likes of `Collections.unmodifiableSet`. That doesn't do a defensive copy and is just a quick wrapper on the incoming set.

Defensive programming has it's place. However, I think it gets over-deployed.

lmm•1mo ago

Rewriting in any language usually makes things much faster and better. That's an interesting anecdote but without a comparison with a C++ -> Java rewrite you can't really generalise.

kccqzy•1mo ago

I had a similar experience rewriting a Go service in C++. Even when using a somewhat heavyweight Google framework (Scaffolding without Boq) even a simplistic rewrite can easily reach 1200qps per core. Over time I gradually optimized it into 8000qps per core through things like ditching fibers for async, tuning event managers, etc.

90s_dev•2mo ago

> Now, why doesn’t bytes.Reader implement Peek? It’s just a byte slice, it’s definitely possible to peek ahead without altering stream state. But it was overlooked, and instead this workaround is applied.

When I first looked at Go, it seemed to have far too many layers of abstraction on top of one another. Which is so ironic, considering that's one of the main things it was trying to fix about Java. It ended up becoming the thing it fought against.

treyd•2mo ago

I would agree with you but not so much here specifically. It's much more true with how goroutines and channels work, in that they're too unstructured and don't compose well, which necessitates needing to make ad-hoc abstractions around them.

rester324•1mo ago

Yeah, they copied that too and the smugness of the early java community.

One example below, although I frequently bump into conversations in the golang community in the same vein just don't bother to save them: https://github.com/golang/go/issues/29461

swisniewski•2mo ago

There's a much simpler way to do this:

If you want your library to operate on bytes, then rather than taking in an io.Reader and trying to figure out how to get bytes out of it the most efficient way, why not just have the library taken in []byte rather than io.Reader?

If someone has a complex reader and needs to extract to a temporary buffer, they can do that. But if like in the author's case you already have []byte, then just pass that it rather than trying to wrap it.

I think the issue here is that the author is adding more complexity to the interface than needed.

If you need a []byte, take in a []byte. Your callers should be able to figure out how to get you that when they need to.

With go, the answer is usually "just do the simple thing and you will have a good time".

TheDong•2mo ago

The author is trying to integrate with the Go stdlib, which requires you produce images from an 'io.Reader". See https://pkg.go.dev/image#RegisterFormat

Isn't using the stdlib simpler than not for your callers?

I also often hear gophers say to take inspiration from the go stdlib. The 'net/http' package's 'http.Request.Body' also has this same UX. Should there be `Body` and `BodyBytes` for the case when your http request wants to refer to a reader, vs wants to refer to bytes you already have?

jchw•2mo ago

The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.

In most cases I'd argue it really is idiomatic Go to offer a []byte API if that can be done more efficiently. The Go stdlib does sometimes offer both a []byte and Reader API for input to encoding/json, for example. Internally, I don't think it actually streams incrementally.

That said I do see why this doesn't actually apply here. IMO the big problem here is that you can't just rip out the Bytes() method with an upcast and use that due to the wrapper in the way. If Go had a way to do somehow transparent wrapper types this would possilby not be an issue. Maybe it should have some way to do that.

TheDong•2mo ago

> The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.

Ah, sorry, we were talking about two different 'http.Request.Body's. For some weird reason both the `http.Client.Do`'s request and `http.Server`'s request are the same type.

You're right that you usually don't have the bytes for the server, but for the client, like a huge fraction of client requests are `http.NewRequestWithContext(context.TODO(), "POST", "api.foo.com", bytes.NewReader(jsonBytesForAPI))`. You clearly have the bytes in that case.

Anyway, another example of the wisdom of the stdlib, you can save on structs by re-using one struct, and then having a bunch of comments like "For server requests, this field means X, for client requests, this is ignored or means Y".

jchw•2mo ago

Thinking about that more though, http.Client.Do is going to take that io.Reader and pipe it out to a socket. What would it do differently if you handed it a []byte? I suppose you could reduce some copying. Maybe worth it but I think Go already has other ways to avoid unnecessary copies when piping readers and writers together (e.g. using `WriterTo` instead of doing Read+Write.)

TheDong•1mo ago

> If body is of type *bytes.Buffer, *bytes.Reader, or *strings.Reader, the returned request's ContentLength is set to its exact value

Servers like to know Content-Length, and the package already special-cases certain readers to effectively treat them like a `[]byte`.

Clearly it does something differently already.

Also, following redirects only works if you can send the body multiple times, so currently whether the client follows redirects or not depends on the type of the reader you pass in... if you add a logging intercepter to your reader to debug something, suddenly your code compiles but breaks because it stops following redirects, ask me how I know.

jchw•1mo ago

In this case, there is not any functionality you can't get through other means: You can set GetBody and the content length header manually, which is what you probably wound up doing if I had to guess (been there too, same hat.) I think Go does this mainly to make basic usage more convenient. Unfortunately though, it makes this unnecessarily subtle footgun in return.

Maybe Go 2 will finally do something about this. I would really like some (hopefully efficient) way to make "transparent" wrapper types that can magically forward methods.

fishywang•1mo ago

The request body on the client do a lot of other things than reading the body once (an io.Reader can only be read once).

There's Content-Length, and there's also the need to read it multiple times in case a redirect happens (so the same body need to be sent again when being redirected).

As a result, the implementation in stdlib would check a few common io.Reader implementations (bytes.Buffer, bytes.Reader, strings.Reader) and make sure it stores something that can be read multiple times (if it's none of the 3, it's read fully into memory and stored instead).

jchw•1mo ago

This is the same basic reply as the other one but my thoughts are roughly the same. The only comment I have aside from what I replied on the sibling comment (this just being another case of wrappers not being transparent biting us in the ass) is that they could've done this in a more generic way than they did, at the downside of requiring more interfaces.

fishywang•1mo ago

Yea I saw your other reply later and agree on most of it. But I'd say there's a balance between simplicity of the API and more specific cases. For example they can make an optional api to io.Reader to provide size info, and maybe another optional api to io.Reader to make it able to be read more than once, etc.. But at the same time, if you have all those info, that _usually_ means you already have either a []byte or string, and you would most likely use one of the 3 types to convert that into an io.Reader, so that special handling is enough without adding more public apis, and the go team is notoriously conservative when adding new public apis.

tptacek•2mo ago

It is, but one of the virtues of the Go ecosystem is that it's also often very easy to fork the standard library; people do it with the whole TLS stack all the time.

The tension Ted is raising at the end of the article --- either this is an illustration of how useful casting is, or a showcase of design slipups in the standard library --- well, :why-not-both:. Go is very careful about both the stability of its standard library and the coherency of its interfaces (no popen, popen2, subprocess). Something has to be traded off to get that; this is one of the things. OK!

ronsor•2mo ago

> people do it with the whole TLS stack all the time.

It's the only way to add custom TLS extensions.

wbl•1mo ago

Adding custom TLS extensions plays badly when the standard library implements them.

throwaway894345•2mo ago

How does using the stdlib internally simplify things for callers? And what does that have to do with tanking inspiration from the stdlib?

On the second point, passing a []byte to something that really does not want a streaming interface is perfectly idiomatic per the stdlib.

I don’t think it complicates things for the caller if the author used a third party deciding function unless it produced a different type besides image.Image (and even then only a very minor inconvenience).

I also don’t think it’s the fault of the stdlib that it doesn’t provide high performance implementations of every function with every conceivable interface.

I do think there’s some reasonable critique to be made about the stdlib’s use of reflection to detect unofficial interfaces, but it’s also a perfectly pragmatic solution for maintaining compatibility while also not have the perfect future knowledge to build the best possible interface from day 0. :shrug:

int_19h•2mo ago

Because it forces the reader to read data into a temporary buffer in its entirety. If the thing this function is trying to do doesn't actually require it to do its job, that introduces unnecessary overhead.

mbrumlow•2mo ago

What? Where else would it be?

It’s either in the socket(and likely not fully arrived) or … in a buffer.

Peak is not some magic, it is well a temporary buffer.

Beyond that, I keep seeing people ask for a byte interface. Has anybody looked at the IO.reader interface ???

type Reader interface { Read(p []byte) (n int, err error) }

You can read as little or as much as you would like and you can do this at any stage of a chain if readers.

nemothekid•2mo ago

You are still doing a copy, and people want to avoid the needless memory copy.

If you are decoding a 4 megabyte jpeg, and that jpeg already exists in memory, then copying that buffer by using the Reader interface is painful overhead.

mbrumlow•1mo ago

And you are going to work on all 4mb at the time? Even if you were to want to plop it on a socket you would just use IO.copy which would be no overhead, as no matter what you are still always going to copy bits out to place it in the socket to be sent.

nemothekid•1mo ago

>And you are going to work on all 4mb at the time?

Yes? Assume you were going to decode the jpeg and display it on screen. I assume the user would want to see the total jpeg at once.

Consider you are working on processing a program that has a bunch of jpegs and is running some AI inference on them.

1. You would read the jpegs from disk into memory. 2. You decode those jpegs in into RGBA buffers 3. You run inference on the RGBA buffers.

The current ImageDecode interface forces you to do a memcopy in between steps 1 and 2.

1. You would read the jpegs from disk into memory. 2. You copy the data in memory into another buffer because you are using the Reader interface 3. You decode those jpegs in into RGBA buffers 4. You run inference on the RGBA buffers.

Step two isn't needed at all, and if the images are large, that can add latency. If you are coding on something like a Raspberry Pi, depending on the size of the jpegs, the delay would be noticable.

Seb-C•1mo ago

Getting an io.Reader over a byte slice is a useful tool, but the primary use case for io.Reader is streaming stuff from the network or file system.

In this context, you can either have the io.Reader do a copy without allocating anything (take in a slice managed by the caller), or allocate and return a slice. There isn't really a middle ground here.

dgb23•2mo ago

Personally I rarely use or even implement interfaces except some other part needs them. My brain thinks in terms of plain data by default.

I appreciate how they compose, for example when I call io.Copy and how things are handled for me. But when I structure my code that way, it’s extra effort that doesn’t come naturally at all.

lanstin•2mo ago

I use them for testing, where I can have a client that is called by the code under test and can either just run a test CB, send a REST call to a remote server, send a gRPC call to a remote server, or make a function call to an in-process gRPC server object.

__turbobrew__•1mo ago

Yea, mocking is generally the most use I get out of interfaces

dgb23•1mo ago

Plain data is really convenient for testing though.

I think the reason that your example is so useful is not generally because of testing, but because the thing you're interacting with has operational semantics. It's a good use case for object orientation, so interfaces and mocking are the natural way of testing your logic.

vjerancrnjak•2mo ago

That's how leaky abstraction of many file std implementations starts.

Reading into a byte buffer, pass in a buffer to read values, pass in a buffer to write values. Then OS does the same thing, has its own buffer that accepts your buffer, then the underlying storage volume has its own buffer.

Buffers all the way down to inefficiency.

woah•2mo ago

Seems pretty crazy to force a bunch of data to be saved into memory all the time just for programming language aesthetic reasons

silverwind•2mo ago

A good API should just accept either,e.g. the union of []byte and io.Reader.

Both have pros and cons and those should be for the user to decide.

thayne•2mo ago

Ah, but go doesn't have union types.

Zambyte•2mo ago

One option would be to accept an interface{} and then switch on the type.

stouset•2mo ago

It's frightening how quickly the answer in golang becomes "downcast to interface{} and force type problems to happen at runtime".

throwaway894345•2mo ago

You don’t need to downcast to interface, io.Reader is already an interface, and a type assertion on an interface (“if this io.Reader is just a byteslice and cursor, then use the byteslice”) is strictly safer than an untagged union and equally safe with a tagged union.

I wish Go had Rust-like enums as well, but they don’t make anything safer in this case (except for the nil interface concern which isn’t the point you’re raising).

wredcoll•1mo ago

Presumably the questions that have simple and easy answers don't get long comment chains.

throwaway894345•2mo ago

An io.Reader is already an interface, so you can already switch on its type.

Zambyte•2mo ago

My comment is explaining how

> A good API should just accept either,e.g. the union of []byte and io.Reader.

could be done. Can you elaborate on how the fact that io.Reader is an interface lets you accept a []byte in the API? To my knowledge, the answer is: you can't. You have to wrap the []byte in an io.Reader, and you are at the exact problem described in the article.

throwaway894345•1mo ago

I see what you’re saying. You’re correct that you can’t pass a []byte as an io.Reader, but you can implement io.Reader on a []byte in a way that lets you get the []byte back out via type switch (the problem in the article was that the standard library type didn’t let you access the internal []byte).

ithkuil•1mo ago

An idiomatic way to approach this would be to define a new interface, let's call it Bytes with a Bytes() []byte method.

Your function would accept an io.Reader but then the function body would typeswitch and check if the argument implements the Bytes interface. If it does, then call the Bytes() method. If it doesn't then call io.ReadAll() and continue to use the []byte in the rest of the impl.

The bytes.Buffer type already implements this Bytes() method with that signature. By the rules of Go this means it will be treated as an implementation of this Bytes interface, even if nobody defined that interface yet in the stdlib.

This is an example of Go's strong duck typing.

Zambyte•1mo ago

That's really interesting, thank you for explaining that! Somehow I've never thought to implement interferences to describe types that already exist.

foldr•1mo ago

You can just expose two different functions, one of which takes a byte slice and one of which takes an io.Reader.

Given how the code works (it starts by buffering the input stream), the second function will just be a few lines of code followed by a call to the first.

Perfect example of how complex type systems can lead people to have unnecessarily complex thoughts.

throwaway894345•2mo ago

Nah, a good API doesn’t push the conditionals down. You don’t need to pass a union to let the user decide, you just need to present an API for each (including a generic implementation that monomorphizes into multiple concrete implementations) https://matklad.github.io/2023/11/15/push-ifs-up-and-fors-do...

Seb-C•1mo ago

100% this, that is the easiest and less error prone way to do it.

Even if the author still insisted on using a single interface, he could also do what he wants by relying on bytes.Buffer rather than bytes.Reader.

0points•1mo ago

When you are working with streaming data, you really should be passing around io.Readers if you want any sort of performance out of it.

A []byte require you to read ALL data in advance.

And if you still end up with []byte and need to use a interface taking io.Reader, then you wrap []byte in a bytes.Buffer which implements io.Reader.

liampulles•2mo ago

The reason interface smuggling exists as a pattern in the Go standard library and others is because the Go team (and those who agree with its philosophy) take breaking API changes really seriously.

It is no small feat that Go is still on major version 1.

treyd•2mo ago

Wouldn't you say that it's a design oversight that the interface system leads to tight constraints on what you can do without breaking APIs?

liampulles•2mo ago

I can't call it a design oversight no, because I'm not sure what reasonable alternatives were considered before Go v1 was released. I also don't have context of all the decision factors that went into Go's spec. To be honest, I'm not anywhere near an expert on programming language design - I'm probably the wrong person to ask.

I am thankful that they haven't broken the spec to change that design, but maybe others don't care about that as much as I do.

hkpack•2mo ago

It seems that go library is ok with you paying the performance price when using io.Reader/io.Writer on memory structures.

You can write clean idiomatic code, but it won’t be the fastest. So for maximum results you should always do everything manually for your use case: i.e. don’t use additional readers/writers and operate on []byte directly if that is what you are working with.

I think it is mostly a good thing - you can quickly write simple but slower code and refactor everything later when needed.

millipede•2mo ago

Type inspection is the flaw of Go's interface system. Try to make a type that delegates to another object, and the type inspection breaks. It's especially noticeable with the net/http types, which would be great to intercept, but then breaks things like Flusher or Hijacker.

38•2mo ago

> What I would like is for my image decoding function to notice that the io.Reader it has been given is in fact a bytes.Reader so we can skip the copy.

What a terrible idea. If you want bytes.reader, then use that in the function signature, or better yet just a byte slice. It should have been a red flag when your solution involves the unsafe package

Groxx•2mo ago

I think you kinda missed the point. The point is that trying to make a user-friendly API with the familiar, highly composable, and extremely common io.Reader that everything (including Go's stdlib) encourages you to use ends up putting you in this unfortunate design corner if you also care about performance.

It's frustration about getting close to a good API, but not having any reasonable way to close the final gap, forcing you do you to go stuff like you mentioned: have multiple near-identical APIs for performance, and needing your users to understand and use them correctly to get a good result.

XorNot•2mo ago

The benefit seems fictional though. When is the user going to have all the bytes in memory but only have an io.Reader? Probably never. If I have a reader it's because the bytes are coming from something which itself does not make that promise. Like the most common application would be an os.File.

If I do have all the bytes in memory, then I have a []byte array, know I have it, and can use the []byte interface you must've implemented internally to use this speed up.

Groxx•1mo ago

It's frustration about getting close to a good API, but not having any reasonable way to close the final gap, forcing you do you to [do] stuff like you mentioned: have multiple near-identical APIs for performance, and needing your users to understand and use them correctly to get a good result.

porridgeraisin•1mo ago

But if you have two modes of operation:

- Reading in memory data

- Reading from a stream

Then whats wrong with having two different APIs to use each mode?

Instead of you doing a if (detect in memory) in your code, it's just two different functions.

Groxx•1mo ago

Nothing. Or next to nothing at worst.

It would just be nice if it could automatically do the best thing.

bobbylarrybobby•2mo ago

I am not a gopher, so this may be a dumb question: when an io.Reader produces a buffer of its contents, does it not have the option of just returning the buffer it wraps if it does in fact wrap a buffer? Something like (pseudocode) `if self isa BufferedReader { self.takeBuffer() } else { let buffer = newBuffer(); self.fill(buffer); buffer }`.

masklinn•2mo ago

1. Like read(2), io.Reader.Read takes a slice parameter to which it writes, it doesn't return one. This is better for a "legitimate" read stream as the caller can amortise allocations.

2. And Read takes the slice by value, so the length, capacity, and buffer pointer are copied into the callee. This gives no way of "swapping buffers", even ignoring that the caller may have issues getting back a slice with a completely different size and capacity than they sent / expected.

nulld3v•2mo ago

I am once again begging for this to be implemented: https://github.com/golang/go/issues/4146

nottorp•2mo ago

Interesting, I don't know Go (yet) but I was messing with it the other weekend.

Can someone please summarize this []byte vs somethingReader thing for me? Assume I can program, just not familiar with Go.

I was reading off sockets and it looked to me that the example code (i randomly ran into) had too many Reader something or other.

Edit: Ok, I know what a streaming class does, they're available in many frameworks. I'm more interested in why you'd get forced to use them in the context of the Go standard library.

Are they mandatory for sockets? Or for interacting with other common functions that I'd use to process the data out of my sockets?

I just wanted to read up to a new line or a hard coded size limit from a socket... ;) Without getting accidentally quadratic in either cpu use or memory use...

mholt•2mo ago

[]byte is a buffer. (io.)Reader is any type that can read bytes, like as part of a stream. There are `bytes.Reader` and `bufio.Reader` and `strings.Reader`, and many others, depending on what you're reading. These three types allow you to read []byte, any buffered input, and string, respectively, as streams.

Streaming is more efficient for large pieces of data (files, etc; whatever you have), but the buffer is usually easier to work with and grants more flexibility.

ashishb•2mo ago

Imagine you are reading files which can be of varying size (KB to GB). A byte array will do a contiguous allocation of the file size.

A Reader can be much more thoughtful. And I say "can be" because someone can make Reader as inefficient as a byte array.

Or they can read in chunks.

For example, if you are trying you read exif data or reading up to first N bytes, Reader is a superior approach.

masklinn•2mo ago

byte[] is a bytes buffer, a sequence of bytes. io.Reader is an abstraction around a read stream. You can adapt a byte[] to a Reader by wrapping it in a bytes.Reader, that way if a function needs a reader and you have bytes, you can give them your bytes.

The problem TFA has, is that bytes.Reader implies a copy: it's going to read data into a second internal slice. So when all their library needs is the bytes, they could use the bytes themselves, and avoid a potentially expensive copy.

Obviously you could just have a second entry point which takes straight byte[] instead of a reader, but in this case they're trying to conform to the standard library's image module[1] which does not expose a bytes interface and conditionally adds further indirection layers.

[1]: https://pkg.go.dev/image

nottorp•1mo ago

> is that bytes.Reader implies a copy

Yes, that's what "smelled" wrong to me too. Is it avoidable or does Go's stdlib force you to go through xxx.Reader reading from yyy.Reader that is made from a zzz.Reader and a ttt.Reader?

masklinn•1mo ago

Unless you know the concrete reader types and they expose their underlying reader or concrete type, short of unsafe yes. And as the essay points out, neither bytes.Reader nor bufio.Reader expose the underlying type.

mholt•1mo ago

There's io.ReaderFrom and io.WriterTo, which can avoid internal buffering.

konart•2mo ago

Go's source code (comment actually) says it all tbh: https://github.com/golang/go/blob/master/src/io/io.go#L55

(very much) tldr: anything that implements `Read(p []byte) (n int, err error)` - is a Reader.

binary132•2mo ago

Breaking virtual types is bad.

eximius•2mo ago

This really feels like trying to use Go for a purpose that it is inherently not designed for: absolute performance.

Go is a fantastic case study in "good enough" practical engineering. And sometimes that means you can't wring out the absolute max performance and that's okay.

It's frustrating, but it is fulfilling it's goals. It's goals just aren't yours.

callc•1mo ago

After using Go for large and medium projects, with hundreds and < 10 people respectively, I’m left with a desire for a better language.

I enjoy the simplicity of Go, but the tradeoff of Go’s imposed limitations (concurrency model, no subtype polymorphism in place of parametric polymorphism, no methods on external types, etc) feel like strict limitations without much benefit. (Minus labels and gotos, that’s great design)

The speed I don’t care about

The model of loose global functions for fundamental operations on built in data types is poor design. append(), delete(), close(), etc. How about make these methods on their respective types?

Overall coding in Go feels like coding in a pre-alpha language. Syntax-wise, not the top notch tooling and packaging.

ncruces•2mo ago

Instead of focusing on bytes.Reader, the author could maybe propose that bytes.Buffer (which already has a Bytes method) implements Peek as well.

bytes.Buffer already has many methods exposing the internal buffer, so one more is not insurmountable.

hinkley•1mo ago

The title is a bit tortured.

Too much indirection I would agree is misdirection. “Too much misdirection” sounds like you’ve gone past bad design into clown car.

Asynchrony is not concurrency

How to write Rust in the Linux kernel: part 3

Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL files

Shutting Down Clear Linux OS

Silence Is a Commons by Ivan Illich (1983)

Broadcom to discontinue free Bitnami Helm charts

Wii U SDBoot1 Exploit “paid the beak”

EPA says it will eliminate its scientific reseach arm

Multiplatform Matrix Multiplication Kernels

lsr: ls with io_uring

Valve confirms credit card companies pressured it to delist certain adult games

Meta says it wont sign Europe AI agreement, calling it growth stunting overreach

Trying Guix: A Nixer's impressions

Replication of Quantum Factorisation Records with a VIC-20, an Abacus, and a Dog

AI capex is so big that it's affecting economic statistics

Show HN: Molab, a cloud-hosted Marimo notebook workspace

Mango Health (YC W24) Is Hiring

CP/M creator Gary Kildall's memoirs released as free download

The year of peak might and magic

Sage: An atomic bomb kicked off the biggest computing project in history

Show HN: I built library management app for those who outgrew spreadsheets

A New Geometry for Einstein's Theory of Relativity

Cancer DNA is detectable in blood years before diagnosis

Show HN: Simulating autonomous drone formations

How I keep up with AI progress

Benben: An audio player for the terminal, written in Common Lisp

Making a StringBuffer in C, and questioning my sanity

Hundred Rabbits – Low-tech living while sailing the world

How to Get Foreign Keys Horribly Wrong

When root meets immutable: OpenBSD chflags vs. log tampering

Asynchrony is not concurrency

How to write Rust in the Linux kernel: part 3

Ccusage: A CLI tool for analyzing Claude Code usage from local JSONL files

Shutting Down Clear Linux OS

Silence Is a Commons by Ivan Illich (1983)

Broadcom to discontinue free Bitnami Helm charts

Wii U SDBoot1 Exploit “paid the beak”

EPA says it will eliminate its scientific reseach arm

Multiplatform Matrix Multiplication Kernels

lsr: ls with io_uring

Valve confirms credit card companies pressured it to delist certain adult games

Meta says it wont sign Europe AI agreement, calling it growth stunting overreach

Trying Guix: A Nixer's impressions

Replication of Quantum Factorisation Records with a VIC-20, an Abacus, and a Dog

AI capex is so big that it's affecting economic statistics

Show HN: Molab, a cloud-hosted Marimo notebook workspace

Mango Health (YC W24) Is Hiring

CP/M creator Gary Kildall's memoirs released as free download

The year of peak might and magic

Sage: An atomic bomb kicked off the biggest computing project in history

Show HN: I built library management app for those who outgrew spreadsheets

A New Geometry for Einstein's Theory of Relativity

Cancer DNA is detectable in blood years before diagnosis

Show HN: Simulating autonomous drone formations

How I keep up with AI progress

Benben: An audio player for the terminal, written in Common Lisp

Making a StringBuffer in C, and questioning my sanity

Hundred Rabbits – Low-tech living while sailing the world

How to Get Foreign Keys Horribly Wrong

When root meets immutable: OpenBSD chflags vs. log tampering

Too Much Go Misdirection

Comments