For example, it doesn't really make sense that "92% of data modification operations" would fail on JuiceFS, which makes me question a lot of the methodology in these tests.
The benchmark suite is trivial and opensource [1].
Is performing benchmarks “putting down” these days?
If you believe that the benchmarks are unfair to juicefs for a reason or for another, please put up a PR with a better methodology or corrected numbers. I’d happily merge it.
EDIT: From your profile, it seems like you are running a VC backed competitor, would be fair to mention that…
I don't want to see the cloud storage sector turn as bitter as the cloud database sector.
I've previously looked through the benchmarking code, and I still have some serious concerns about the way that you're presenting things on your page.
The actual code being benchmarked is trivial and open-source, but I don't see the actual JuiceFS setup anywhere in the ZeroFS repository. This means the self-published results don't seem to be reproducible by anyone looking to externally validate the stated claims in more detail. Given the very large performance differences, I have a hard time believing it's an actual apples-to-apples production-quality setup. It seems much more likely that some simple tuning is needed to make them more comparable, in which case the takeaway may be that JuiceFS may have more fiddly configuration without well-rounded defaults, not that it's actually hundreds of times slower.
JuiceFS scales out horizontally as each individual client writes/reads directly to/from S3, as long as the metadata engine keeps up it has essentially unlimited bandwidth across many compute nodes.
But as the benchmark shows, it is fiddly especially for workloads with many small files and is pretty wasteful in terms of S3 operations, which for the largest workloads has meaningful cost.
I think both have their place at the moment. But the space of "advanced S3-backed filesystems" is... advancing these days.
> The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is kept durable with Litestream. Nothing depends on local storage.
* poor locking support (this sounds like it works better)
* it's slow
* no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"
* The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)
Unfortunately, NFSv4 also has the silly rename semantics...
When we tried it at Krea we ended up moving on because we couldn't get sufficient performance to train on, and having to choose which datacenter to deploy our metadata store on essentially forced us to only use it one location at a time.
Plasmoid•1h ago