In an ideal world you could just ask the host to stream 100MB of stuff into a byte array or slice of the wasm heap. Alas.
for await (const chunk of stream) {
// process the chunk
stream.returnChunk(chunk);
}
This would be entirely optional. If you don’t return the chunk and instead let GC free it, you get the normal behavior. If you do return it, then the stream is permitted to return it again later.(Lately I’ve been thinking that a really nice stream or receive API would return an object with a linear type so that you must consume it and possibly even return it. This would make it impossible to write code where task cancellation causes you to lose received data. Sadly, mainstream languages can’t do this directly.)
I may be naive in asking this, but what leads someone to building high perf data tools in JS? JS doesn't seem to me like it would be the tool of choice for such things
Performance-wise, I get about half the throughput I had with the same processsing done it rust, which doesn't change anything for my use-case.
However that's not really relevant to the context of the post as I'm using node.js streams which are both saner and fast. I'm guessing that the post is relevant to people using server-side runtimes that only implement web streams.
right now when i need to wrangle bytes, i switch languages to Golang. it’s easy gc language, and all its IO is built around BYOB api:
interface Reader { read(b: Uint8Array): [number, Error?] }
you pass in your own Uint8Array allocation (in go terms, []byte), the reader fills at most the entire thing, and returns (bytes filled, error). it’s a fully pull stream API with one method at its core. now, the api gets to be that simple because it’s always sync, and blocks until the reader can fill data into the buffer or returns an error indicating no data available right now.
go has a TeeReader with no buffering - it too just blocks until it can write to the forked stream.
https://pkg.go.dev/io#TeeReader
we can’t do the same api in JS, because go gets to insert `await` wherever it wants with its coroutine/goroutine runtime. but we can dream of such simplicity combined with zero allocation performance.
> I'm not here to disparage the work that came before — I'm here to start a conversation about what can potentially come next.
Terrible LLM-slop style. Is Mr Snell letting an LLM write the article for him or has he just appropriated the style?
Possibly the LLM vendors could bias the models more toward nonprofessional content, but then the quality and utility of the output would suffer. Skip the scientific articles and books, focus on rando internet comments, and you’ll end up with a lot more crap than you already get.
You might say well, it's on the Cloudflare blog so obviously it must have some merit, but after the Matrix incident...
It doesn't really matter what tools are used if the result is good
Just ctrl-f'ing through previous public posts, I think there were a total of 7 used across about that many posts. This one for example had 57. I'm not good enough in proper English to know what the normal number is supposed to be, just pointing that out.
I’ve read my fair share of LLM slop. This doesn’t qualify.
Sadly it will never happen. WebAssembly failed to keep some of its promises here.
They propose just using an async iterator of UInt8Array. I almost like this idea, but it's not quite all the way there.
They propose this:
type Stream<T> = {
next(): Promise<{ done, value: UInt8Array<T> }>
}
I propose this, which I call a stream iterator! type Stream<T> = {
next(): { done, value: T } | Promise<{ done, value: T }>
}
Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:- I can easily make mine from theirs
- In theirs the conceptual "stream" is defined by an iterator of iterators, meaning you need a for loop of for loops to step through it. In mine it's just one iterator and it can be consumed with one for loop.
- I'm not limited to having only streams of integers, they are
- My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.
- The problem with thrashing Promises when splitting input up into words goes away. With async iterators, creating two words means creating two promises. With stream iterators if you have the data available there's no need for promises at all, you just yield it.
- Stream iterators can help you manage concurrency, which is a huge thing that async iterators cannot do. Async iterators can't do this because if they see a promise they will always wait for it. That's the same as saying "if there is any concurrency, it will always be eliminated."
To see the problem let's create a stream with feedback. Lets say we have an assembly line that produces muffins from ingredients, and the recipe says that every third muffin we produce must be mushed up and used as an ingredient for further muffins. This works OK until someone adds a final stage to the assembly line, which puts muffins in boxes of 12. Now the line gets completely stuck! It can't get a muffin to use on the start of the line because it hasn't made a full box of muffins yet, and it can't make a full box of muffins because it's starved for ingredients after 3.
If we're mandated to clump the items together we're implicitly assuming that there's no feedback, yet there's also no reason that feedback shouldn't be a first-class ability of streams.
> - I can easily make mine from theirs
That... doesn't make it superior? On the contrary, theirs can't be easily made out of yours, except by either returning trivial 1-byte chunks, or by arbitrary buffering. So their proposal is a superior primitive.
On the whole, I/O-oriented iterators probably should return chunks of T, otherwise you get buffer bloat for free. The readv/writev were introduced for a reason, you know.
Plus theirs involves the very concrete definition of an array, which might have 100 prototype methods in JS, each part of their API surface. I have one function in my API surface.
Adding types on top of that isn't a protocol concern but an application-level one.
While I understand the logic, that's a terrible idea.
* The overhead is massive. Now every 1KiB turns into 1024 objects. And terrible locality.
* Raw byte APIs...network, fs, etc fundamentally operate on byte arrays anyway.
In the most respectful way possible...this idea would only be appealing to someone who's not used to optimizing systems for efficiency.
Small, short-lived objects with known key ordering (monomorphism) are not a major cost in JS because the GC design is generational. The smallest, youngest generation of objects can be quickly collected with an incremental GC because the perf assumption is that most of the items in the youngest generation will be garbage. This allows collection to be optimized by first finding the live objects in the gen0 pool, copying them out, then throwing away the old gen0 pool memory and replacing it with a new chunk.
From what it looks like, they want their streams to be compatible with AsyncIterator so it'd fit into existing ecosystem of iterators.
And I believe the Uint8Array is there for matching OS streams as they tend to move batches of bytes without having knowledge about the data inside. It's probably not intended as an entirely new concept of a stream, but something that C/C++ or other language that can provide functionality for JS, can do underneath.
For example my personal pet project of a graph database written in C has observers/observables that are similar to the AsyncIterator streams (except one observable can be listened to by more than one observer) moving about batches of Uint8Array (or rather uint8_t* buffer with capacity/count), because it's one of the fastest and easiest thing to do in C.
It'd be a lot more work to use anything other than uint8_t* batches for streaming data. What I mean by that, is that any other protocol that is aware of the type information would be built on top of the streams, rather than being part of the stream protocol itself for this reason.
https://github.com/ralusek/streamie
allows you to do things like
infiniteRecords
.map(item => doSomeAsyncThing(item), { concurrency: 5 });
And then because I found that I often want to switch between batching items vs dealing with single items: infiniteRecords
.map(item => doSomeAsyncSingularThing(item), { concurrency: 5 })
.map(groupOf10 => doSomeBatchThing(groupsOf10), { batchSize: 10 })
// Can flatten back to single items
.map(item => backToSingleItem(item), { flatten: true }); import { Repeater } from "@repeaterjs/repeater";
const keys = new Repeater(async (push, stop) => {
const listener = (ev) => {
if (ev.key === "Escape") {
stop();
} else {
push(ev.key);
}
};
window.addEventListener("keyup", listener);
await stop;
window.removeEventListener("keyup", listener);
});
const konami = ["ArrowUp", "ArrowUp", "ArrowDown", "ArrowDown", "ArrowLeft", "ArrowRight", "ArrowLeft", "ArrowRight", "b", "a"];
(async function() {
let i = 0;
for await (const key of keys) {
if (key === konami[i]) {
i++;
} else {
i = 0;
}
if (i >= konami.length) {
console.log("KONAMI!!!");
break; // removes the keyup listener
}
}
})();
https://github.com/repeaterjs/repeaterIt’s one of those abstractions that’s feature complete and stable, and looking at NPM it’s apparently getting 6.5mil+ downloads a week for some reason.
Lately I’ve just taken the opposite view of the author, which is that we should just use streams, especially with how embedded they are in the `fetch` proposals and whatever. But the tee critique is devastating, so maybe the author is right. It’s exciting to see people are still thinking about this. I do think async iterables as the default abstraction is the way to go.
The objection is
> The Web streams spec requires promise creation at numerous points — often in hot paths and often invisible to users. Each read() call doesn't just return a promise; internally, the implementation creates additional promises for queue management, pull() coordination, and backpressure signaling.
But that's 95% manageable by altering buffer sizes.
And as for that last 5%....what are you doing with JS to begin with?
I'm working on a db driver that uses it by convention as part of connection/pool usage cleanup.
user3939382•1h ago
This is what UDP is for. Everything actually has to be async all the way down and since it’s not, we’ll just completely reimplement the OS and network on top of itself and hey maybe when we’re done with that we can do it a third time to have the cloud of clouds.
The entire stack we’re using right down to the hardware is not fit for purpose and we’re burning our talent and money building these ever more brittle towering abstractions.
afavour•1h ago
delaminator•1h ago
user3939382•20m ago
delaminator•7m ago