I'm excited to start doing some experimentation with Vortex to see how it can improve our products.
Great stuff, congrats to Will and team!
Application error: a client-side exception has occurred while loading vortex.dev (see the browser console for more information).
Console: unable to create webgl context
You may be interested in https://github.com/vortex-data/vortex which of course has an overview and links to their docs and benchmark pages.
EDIT> Maybe its how some poeple call the 4th dimension time when there is infact a 4th spatial dimension. So I guess if this is the 3rd Data dimension like what is the 4th one?
Who knows, maybe a Web 3.1 will deliver us from Enshitification.
... i'm gonna make revolutionary claims and grandiose statements like "built for the ai era".
So it's "optimized for machines to consume" meaning the GPU.
Their use case was training ML models where you need to feed the GPU massive datasets as part of training.
They seem to claim that training is now bottlenecked by how quickly you can feed the GPU, that otherwise the GPU is basically "waiting on IO" most of the time and not actual computing because the time goes in just grabbing the next piece of data, transforming it for GPU consumption, and then feeding it into the GPU.
But I'm not an expert, this is just my take from the article.
I would think that a GPU isn't just sitting there waiting on a process that's in turn waiting for one query to finish to start the next query, but that a bunch of parallel queries and scans would be running, fed from many DB and object store servers, keeping the GPUs as utilized as possible. Given how expensive GPUs are, it would seem like a good trade to buy more servers to keep them fed, even if you do want to make the servers and DB/object store reads faster.
Seems that they are targeting a low-to-no overhead path from s3 bucket to GPU, by targeting: same compression/faster random access, streamed encoding from S3 while in flight, zero copy to GPU.
Not 100% clear on the details, but I doubt that they can actually saturate the cpu/gpu bus, but rather just saturate the GPU utilization, which is itself dependent on multiple possible bottlenecks but generally not on bus bandwidth.
That's not criticism: it literally means you can't do better unless you improve the GPU utilization of your AI model.
What's unanswered in the blog post is how a new storage format eliminates the bottleneck. Once you eliminate storage bottlenecks, the remaining bottleneck is usually the PCI bus that sits between the host memory and the GPU, and they can't solve that themselves. It might be that their database format is more space-efficient, which makes bus transfers more efficient and makes better use of the GPU's onboard memory.
They've also left unanswered how they're going to commercialize it, but my guess is that they're going to use a proprietary fork of Vortex that provides extra performance or features. The open-source release gives its customers a Reason to Believe, in marketing parlance.
basically im not sure where the product is hiding under all of this bluster but this doesnt feel very "hacker"-Y
all2•1h ago
> P.S. If you're sttill managing data in spreadsheets, this post isn't for you. Yet.
---
Since I discovered the ECS pattern, I've been curious about backing it with a database. One of the big issues seems to be IO on the database side. I wonder if Spiral might solve this issue.
lordnacho•1h ago
Then you could save every single state change and scroll back and forth. But I'm not sure if you were looking for that.
harwoodr•58m ago
https://github.com/ClockworkLabs/SpacetimeDB