In an ideal world you could just ask the host to stream 100MB of stuff into a byte array or slice of the wasm heap. Alas.
for await (const chunk of stream) {
// process the chunk
stream.returnChunk(chunk);
}
This would be entirely optional. If you don’t return the chunk and instead let GC free it, you get the normal behavior. If you do return it, then the stream is permitted to return it again later.(Lately I’ve been thinking that a really nice stream or receive API would return an object with a linear type so that you must consume it. This would make it impossible to write code where task cancellation causes you to lose received data. Sadly, mainstream languages can’t do this directly.)
I may be naive in asking this, but what leads someone to building high perf data tools in JS? JS doesn't seem to me like it would be the tool of choice for such things
> I'm not here to disparage the work that came before — I'm here to start a conversation about what can potentially come next.
Terrible LLM-slop style. Is Mr Snell letting an LLM write the article for him or has he just appropriated the style?
Sadly it will never happen. WebAssembly failed to keep some of its promises here.
The propose just using an async iterator of UInt8Array. I almost like this idea, but it's not quite all the way there.
They propose this:
type Stream<T> = {
next(): Promise<{ done, value: UInt8Array<T> }>
}
I propose this, which I call a stream iterator! type Stream<T> = {
next(): { done, value: T } | Promise<{ done, value: T }>
}
Obviously I'm gonna be biased, but I'm pretty sure my version is also objectively superior:- I can easily make mine from theirs
- In theirs the conceptual "stream" is defined by an iterator of iterators, meaning you need a for loop of for loops to step through it. In mine it's just one iterator and it can be consumed with one for loop.
- I'm not limited to having only streams of integers, they are
- My way, if I define a sync transform over a sync input, the whole iteration can be sync making it possible to get and use the result in sync functions. This is huge as otherwise you have to write all the code twice: once with sync iterator and for loops and once with async iterators and for await loops.
- The problem with thrashing Promises when splitting input up into words goes away. With async iterators, creating two words means creating two promises. With stream iterators if you have the data available there's no need for promises at all, you just yield it.
- Stream iterators can help you manage concurrency, which is a huge thing that async iterators cannot do. Async iterators can't do this because if they see a promise they will always wait for it. That's the same as saying "if there is any concurrency, it will always be eliminated."
To see the problem let's create a stream with feedback. Lets say we have an assembly line that produces muffins from ingredients, and the recipe says that every third muffin we produce must be mushed up and used as an ingredient for further muffins. This works OK until someone adds a final stage to the assembly line, which puts muffins in boxes of 12. Now the line gets completely stuck! It can't get a muffin to use on the start of the line because it hasn't made a full box of muffins yet, and it can't make a full box of muffins because it's starved for ingredients after 3.
If we're mandated to clump the items together we're implicitly assuming that there's no feedback, yet there's also no reason that feedback shouldn't be a first-class ability of streams.
https://github.com/ralusek/streamie
allows you to do things like
infiniteRecords
.map(item => doSomeAsyncThing(item), { concurrency: 5 });
And then because I found that I often want to switch between batching items vs dealing with single items: infiniteRecords
.map(item => doSomeAsyncSingularThing(item), { concurrency: 5 })
.map(groupOf10 => doSomeBatchThing(groupsOf10), { batchSize: 10 })
// Can flatten back to single items
.map(item => backToSingleItem(item), { flatten: true });
user3939382•59m ago
This is what UDP is for. Everything actually has to be async all the way down and since it’s not, we’ll just completely reimplement the OS and network on top of itself and hey maybe when we’re done with that we can do it a third time to have the cloud of clouds.
The entire stack we’re using right down to the hardware is not fit for purpose and we’re burning our talent and money building these ever more brittle towering abstractions.
afavour•54m ago
delaminator•19m ago