ps : there are actually other faster and more secure options than io-uring but I won't spoil ;)
Is that not accurate?
* In terms of a high-performance AI-focused S3 competitor, how does this compare to NVIDIA's AIstore? https://aistore.nvidia.com/
* What's the clustering story? Is it complex like ceph, requires K8s like AIstore for full functionality, or is it more flexible like Garage, Minio, etc?
* You spend a lot of time talking about performance; do you have any benchmarks?
* Obviously most of the page was written by ChatGPT: what percentage of the code was written by AI, and has it been reviewed by a human?
* How does the object storage itself work? How is it architected? Do you DHT, for example? What tradeoffs are there (CAP, for example) vs the 1.4 gazillion alternatives?
* Are there any front-end or admin tools (and screenshots)?
* Can a cluster scale horizontally or only vertically (ie Minio)
* Why not instead just fork a previous version of Minio and then put a high-speed metadata layer on top?
* Is there any telemetry?
* Although it doesn't matter as much for my use case as for others, what is the specific jurisdiction of origin?
* Is there a CLA and does that CLA involve assigning rights like copyright (helps prevent the 'rug-pull' closing-source scenario)?
* Is there a non-profit Foundation, goal for CNCF sponsorship or other trusted third-party to ensure that the software remains open source (although forks of prior versions mostly mitigates that concern)?
Thanks!
I wonder in that's why it's all over the place. Meta engine written in Zig, okay, do I need to care? Gateway in Rust... probably a smart choice, but why do I need to be able to pick between web frameworks?
> Most object stores use LSM-trees (good for writes, variable read latency) or B+ trees (predictable reads, write amplification). We chose a radix tree because it naturally mirrors a filesystem hierarchy
Okay, so are radix tree good for write, and reads, bad for both, somewhere in between?
What is "physiological logging"?
I could only find references to this in database systems course notes, which may indicate something.
How does that compare to something like JuiceFS.
And in "Why Not Just Use a Filesystem?", the answer they gave is "the line is already blurring" and "industry is converging".
The line maybe blurring but as mentioned is still a clear cut use case for file system - or if higher access speed is warranted, just slap more RAM to the system and cache them. It will still cost less even at current cost of RAM.
Why not use any of the great KV stores out there? Or a traditional database even.
People use object storage for the low cost, not because it is a convenient abstraction. I suspect some people use the faster expensive S3 simply as a stopgap. Because they started with object storage, the requirements changed, it is no longer the right tool for the job but it is a hassle to switch, and AWS is taking advantage of their situation. I suppose that offering an alternative to those people for a non-extortionate price is a decent business model, but I am not sure how big that market is or how long it will last. And it's not really a question of better tech, I'm sure AWS could make it a lot cheaper if they wanted to.
But object storage at the price of a database with the performance of a database, is just a database, and I doubt that quickly reinventing that wheel yielded anything too competitive.
The blog argues that AI workloads are bottlenecked by latency because of 'millions of small files.' But if you are training on millions of loose 4KB objects directly from network storage, your data pipeline is the problem, not the storage layer.
Data Formats: Standard practice is to use formats like WebDataset, Parquet, or TFRecord to chunk small files into large, sequential blobs. This negates the need for high-IOPS metadata operations and makes standard S3 throughput the only metric that matters (which is already plentiful).
Caching: Most high-performance training jobs hydrate local NVMe scratch space on the GPU nodes. S3 is just the cold source of truth. We don't need sub-millisecond access to the source of truth, we need it at the edge (local disk/RAM), which is handled by the data loader pre-fetching.
It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf
Every generation seems to have to learn the lesson about batching small inputs together to keep throughput up.
That doesn't work on Parquet or anything compressed. In real-time analytics you want to load small files quickly into a central location where they can be both queried and compacted (different workloads) at the same time. This is hard to do in existing table formats like Iceberg. Granted not everyone shares this requirement but it's increasingly important for a wide range of use cases like log management.
There are solutions for this, but the added complexity is big. In any case, your training code and data storage become tightly coupled. If you can avoid it by having a faster storage solution, at least I would be highly appreciative of it.
I'm curious about one aspect though. The price comparison says storage is "included," but that hides the fact that you only have 2TB on the suggested instance type, bringing the storage cost to $180/TB/mo if you pay each year up-front for savings, $540/TB/mo when you consider that the durability solution is vanilla replication.
I know that's "double counting" or whatever, but the read/write workloads being suggested here are strange to me. If you only have 1875GB of data (achieved with 3 of those instances because of replication) and sustain 10k small-object (4KiB) QPS as per the other part of the cost comparison, you're describing a world where you read and/or write 50x your entire storage capacity every month.
I know there can be hot vs cold objects or workloads where most data is transient, but even then that still feels like a lot higher access amplification than I would expect from most workloads (or have ever observed in any job I'm allowed to write about publicly). With that in mind, the storage costs themselves actually dominate, and you're at the mercy of AWS not providing any solution even as cheap as 6x the cost of a 2-year amortized SSD (and only S3 comes close -- it's worse when you rent actual "disks," doubly so when they're high-performance).
We eliminated MinIO on vSAN in lieu of ObjectScale for on prem.
A lot of the high performance S3 alternatives trumpet crazy IOPS numbers, but the devil is in how they handle metadata and consistency. FractalBits says it offers strong consistency and atomic rename ([Why We Built Another Object Storage (And Why It's Different)](https://fractalbits.com/blog/why-we-built-another-object-sto...)), which makes it different from most eventual consistency S3 clones. That implies a full‑path indexing metadata engine (something they mention in a LinkedIn post). That’s a really interesting direction because it potentially avoids some of the inode bottlenecks you see in Ceph and MinIO.
BUT the real question for me is long‑term sustainability. Running your own object store is a commitment. Who's maintaining it when the original team moves on? It's great to see new entrants with ideas, ALSO it would be reassuring if there were clear governance and a non‑profit steward at some point.
I don't mind if something uses AI to draft marketing copy... as long as the code is readable, reviewed, and licensed in a way that keeps it open. The space is crowded, and differentiation often comes down to the less flashy stuff: operational tooling, monitoring, easy deployment across zones, and how it fails. I'm curious to see where this one goes.
fractalbits•4h ago