Good lord.
So... call me old and crotchety, but i'm not sure I trust someone to write a DFS like this that once thought NFS a good idea. I'm sure its fine, I just have bad memories.
Historically NFS has had many flaws on different O/S-es. Many of these issues appear to have been resolved over time and I have not seen it being referred to as "Nightmare File System" for decades.
However, depending on many factors NFS may still be a bad choice. In our setup, for example, using a large SQLite database through NFS turns out to be up to 10 times as slow as using a "real" disk.
The SQLite FAQs warn about bigger problems than slowness: https://www.sqlite.org/faq.html#q5
It was a long long time ago that we were only using NFS, it ran on top of a Solaris machine running ZFS. It did its job at the very beginning, but you don't build up hundreds of petabytes of data on an NFS server.
We did try various solutions in between NFS and developing TernFS, both open source and properietary. However we didn't name these specifically in the blog post because there's little point in bad mouthing what didn't work out for us.
For local filesystems, the average PC user shouldn't really care though. Just use whatever your installer defaults. But this story is about a distributed filesystem.
I don't have great hopes for one capable of such massive scale being good and usable (low overhead, low complexity, low adminst cost) in very small configurations, but we can always hope.
TernFS – An exabyte scale, multi-region distributed filesystem, 247 points, 4 days ago, https://news.ycombinator.com/item?id=45290245
Gluster was OK. We never pushed it very hard but it mostly just worked. Performance wasn't great but we encouraged users to use scratch space that was local to the node where their job was running anyway.
jauntywundrkind•4mo ago
Some notable constraints: files are immutable, write-once update never. Designed for files at least 2MB in size. Slow at directory creation/deletion. No permissions/access control.
bionsystem•4mo ago
jleahy•4mo ago
These limits aren't quite as strict as they first seem.
Our median file size is 2MB, which means 50% of our files are <2MB. Realistically if you've got an exabyte of data with an average file size of a few kilobytes then this is the wrong tool for the job (you need something more like a database), but otherwise it should be just fine. We actually have a nice little optimisation where very small files are stored inline in the metadata.
It works out of the box with "normal" tools like rsync, python, etc despite the immutability. The reality is that most things don't actually modify files, even text editors tend to save a new version and rename over the top. We had to update relatively little of our massive code base when switching over to this. For us that was a big win, moving to an S3-like interface would have required updating a lot of code.
Directory creation/deletion is "slow", currenly limited to about 10,000 operations per second. We don't current need to create more than 10,000 directories per second so we just haven't prioritised improving that. There is an issue open, #28, which would get this up to 100,000 per second. This is the sort of thing that, like access control, I would love to have had in an initial open source release, but we prioritised open sourcing what we have over getting it perfect.
lucyjojo•4mo ago
em-bee•4mo ago
it is essentially copy-on-write exposed to the user level. the only issue is that this breaks hard links, so tools that rely on that are going to break. but yes, custom code should be easy to adapt.
jleahy•4mo ago
em-bee•4mo ago