ZFS does not need or benefit from ECC memory any more than any other FS. The bitflip corrupted the data, regardless of ZFS. Any other FS is just oblivious, ZFS will at least tell you your data is corrupt but happily keep operating.
> ZFS' RAM-hungry nature
ZFS is not really RAM-hungry, unless one uses deduplication (which is not enabled by default, nor generally recommended). It can often seem RAM hungry on Linux because the ARC is not counted as “cache” like the page cache is.
---
ZFS docs say as much as well: https://openzfs.github.io/openzfs-docs/Project%20and%20Commu...
I don't think it is. I've never heard of that happening, or seen any evidence ZFS is more likely to break than any random filesystem. I've only seen people spreading paranoid rumors based on a couple pages saying ECC memory is important to fully get the benefits of ZFS.
Some of the things they say aren't credible, even if they're said often.
You don't need an enormous amount of ram to run zfs unless you have dedupe enabled. A lot of people thought they wanted dedupe enabled though. (2024's fast dedupe may help, but probably the right answer for most people is not to use dedupe)
It's the same thing with the "need" for ECC. If your ram is bad, you're going to end up with bad data in your filesystem. With ZFS, you're likely to find out your filesystem is corrupt (although, if the data is corrupted before the checksum is calculated, then the checksum doesn't help); with a non-checksumming filesystem, you may get lucky and not have meta data get corrupted and the OS keeps going, just some of your files are wrong. Having ECC would be better, but there's tradeoffs so it never made sense for me to use it at home; zfs still works and is protecting me from disk contents changing, even if what was written could be wrong.
What's a bit flip?
Usually attributed to "cosmic rays", but really can happen for any number of less exciting sounding reasons.
Basically, there is zero double checking in your computer for almost everything except stuff that goes across the network. Memory and disks are not checked for correctness, basically ever on any machine anywhere. Many servers(but certainly not all) are the rare exception when it comes to memory safety. They usually have ECC(Error Correction Code) Memory, basically a checksum on the memory to ensure that if memory is corrupted, it's noticed and fixed.
Essentially every filesystem everywhere does zero data integrity checking:
MacOS APFS: Nope
Windows NTFS: Nope
Linux EXT4: Nope
BSD's UFS: Nope
Your mobile phone: Nope
ZFS is the rare exception for file systems that actually double check the data you save to it is the data you get back from it. Every other filesystem is just a big ball of unknown data. You probably get back what you put it, but there is zero promises or guarantees.I'm not sure that's really accurate -- all modern hard drives and SSD's use error-correcting codes, as far as I know.
That's different from implementing additional integrity checking at the filesystem level. But it's definitely there to begin with.
But there is ABSOLUTELY NO checksum for the bits stored on a SSD. So bit rot at the cells of the SSDs are undetected.
It has been years since I was familiar enough with the insides of SSDs to tell you exactly what they are doing now, but even ~10-15 years ago it was normal for each raw 2k block to actually be ~2176+ bytes and use at least 128 bytes for LDPC codes. Since then the block sizes have gone up (which reduces the number of bytes you need to achieve equivalent protection) and the lithography has shrunk (which increases the raw error rate).
Where exactly the error correction is implemented (individual dies, SSD controller, etc) and how it is reported can vary depending on the application, but I can say with assurance that there is no chance your OS sees uncorrected bits from your flash dies.
My point was, on most consumer compute, there is no promises or guarantees that what you see on day 1 will be there on day 2. It mostly works, and the chances are better than even that your data will be mostly safe on day 2, but there is zero promises or guarantees, even though we know how to do it. Some systems do, those with ECC memory and ZFS for example. Other filesystems also support checksumming, like BTRFS being the most common counter-example to ZFS. Even though parts of BTRFS are still completely broken(see their status page for details).
That is a notorious myth.
https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...
Neither here nor there, but DTrace was ported to iPhone--it was shown to me in hushed tones in the back of an auditorium once...
[1]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-ana...
[2]: https://ahl.dtrace.org/2016/06/19/apfs-part5/#checksums
A very long ago someone named cyberjock was a prolific and opinionated proponent of ZFS, who wrote many things about ZFS during a time when the hobbyist community was tiny and not very familiar with how to use it and how it worked. Unfortunately, some of their most misguided and/or outdated thoughts still haunt modern consciousness like an egregore.
What you are probably thinking of is the proposed doomsday scenario where bad ram could theoretically kill a ZFS pool during a scrub.
This article does a good job of explaining how that might happen, and why being concerned about it is tilting at windmills: https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...
I have never once heard of this happening in real life.
Hell, I’ve never even had bad ram. I have had bad sata/sas cables, and a bad disk though. ZFS faithfully informed me there was a problem, which no other file system would have done. I’ve seen other people that start getting corruption when sata/sas controllers go bad or overheat, which again is detected by ZFS.
What actually destroys pools is user error, followed very distantly by plain old fashioned ZFS bugs that someone with an unlucky edge case ran into.
To what degree can you separate this claim from "I've never noticed RAM failures"?
I got into overclocking both regular and ECC DDR4 ram for a while when AMD’s 1st gen ryzen stuff came out, thanks to asrock’s x399 motherboard which unofficially supporting ECC, allowing both it’s function and reporting of errors (produced when overlocking)
Based on my own testing and issues seen from others, regular memory has quite a bit of leeway before it becomes unstable, and memory that’s generating errors tends to constantly crash the system, or do so under certain workloads.
Of course, without ECC you can’t prove every single operation has been fault free, but as some point you call it close enough.
I am of the opinion that ECC memory is the best memory to overclock, precisely because you can prove stability simply by using the system.
All that said, as things become smaller with tighter specifications to squeeze out faster performance, I do grow more leery of intermittent single errors that occur on the order of weeks or months in newer generations of hardware. I was once able to overclock my memory to the edge of what I thought was stability as it passed all tests for days, but about every month or two there’d be a few corrected errors show up in my logs. Typically, any sort of stability is caught by manual tests within minutes or the hour.
This is the correct person: https://github.com/don-brady
Also can confirm Don is one of the kindest, nicest principal engineer level people I’ve worked with in my career. Always had time to mentor and assist.
Is it fair to say ZFS made most sense on Solaris using Solaris Containers on SPARK?
[1]: https://www.theregister.com/2005/11/16/sun_thumper/
[2]: https://ubuntu.com/blog/zfs-is-the-fs-for-containers-in-ubun...
And there is also the Stratis project Red Hat is involved in: https://stratis-storage.github.io/
Still no checksumming though...
Sun salespeople tried to sell us the idea of "zfs filesystems are very cheap, you can create many of them, you don't need quota" (which ZFS didn't have at the time), which we tried out. It was abysmally slow. It was even slow with just one filesystem on it. We scrapped the whole idea, just put Linux on them and suddenly fileserver performance doubled. Which is something we weren't used to with older Solaris/Sparc/UFS or /VXFS systems.
We never tried another generation of those, and soon after Sun was bought by Oracle anyways.
Although it does not change the answer to the original question, I have long been under the impression that part of the design of ZFS had been influenced by the Niagara processor. The heavily threaded ZIO pipeline had been so forward thinking that it is difficult to imagine anyone devising it unless they were thinking of the future that the Niagara processor represented.
Am I correct to think that or did knowledge of the upcoming Niagara processor not shape design decisions at all?
By the way, why did Thumper use an AMD Opteron over the UltraSPARC T1 (Niagara)? That decision seems contrary to idea of putting all of the wood behind one arrow.
As for Thumper using Opteron over Niagara: that was due to many reasons, both technological (Niagara was interesting but not world-beating) and organizational (Thumper was a result of the acquisition of Kealia, which was independently developing on AMD).
And for the thin-provisioned snapshotted subvolume usecase, btrfs is currently eating ZFS's lunch due to far better Linux integration. Think snapshots at every update, and having a/b boot to get back to a known-working config after an update. So widespread adoption through the distro route is out of the question.
Also, ZFS has a bad name within the Linux community due to some licensing stuff. I find that most BSD users don't really care about such legalese and most people I know that run FreeBSD are running ZFS on root. Which works amazingly well I might add.
Especially with something like sanoid added to it, it basically does the same as timemachine on mac, a feature that users love. Albeit stored on the same drive (but with syncoid or just manually rolled zfs send/recv scripts you can do that on another location too).
This is out of an abundance of caution. Canonical bundle ZFS in the Ubuntu kernel and no one sued them (yet).
But really, this is a concern for distros. Not for end users. Yet many of the Linux users I speak to are somehow worried about this. Most can't even describe the provisions of the GPL so I don't really know what that's about. Just something they picked up, I guess.
None of this is a worry about being sued as an end user. But all of those are worries that you life will be harder with ZFS, and a lot harder as soon as the first lawsuits hit anyone, because all the current (small) efforts to keep it working will cease immediately.
I don't think it's that they don't care, it's that the CDDL and BSD-ish licenses are generally believed to just not have the conflict that CDDL and GPL might. (IANAL, make your own conclusions about whether either of those are true)
What a weird take. BSD's license is compatible with ZFS, that's why. "Don't really care?" Really? Come on.
The business case for providing a robust desktop filesystem simply doesn’t exist anymore.
20 years ago, (regular) people stored their data on computers and those needed to be dependable. Phones existed, but not to the extent they do today.
Fast forward 20 years, and many people don’t even own a computer (in the traditional sense, many have consoles). People now have their entire life on their phones, backed up and/or stored in the cloud.
SSDs also became “large enough” that HDDs are mostly a thing of the past in consumer computers.
Instead you today have high reliability hardware and software in the cloud, which arguably is much more resilient than anything you could reasonably cook up at home. Besides the hardware (power, internet, fire suppression, physical security, etc), you’re also typically looking at multi geographical redundancy across multiple data centers using reed-Solomon erasure coding, but that’s nothing the ordinary user needs to know about.
Most cloud services also offer some kind of snapshot functionality as malware protection (ie OneDrive offers unlimited snapshots for 30 days rolling).
Truth is that most people are way better off just storing their data in the cloud and making a backup at home, though many people seem to ignore the latter, and Apple makes it exceptionally hard to automate.
You would have early warning with ZFS. You have data loss with your plan.
Apple was already working to integrate ZFS when Oracle bought Sun.
From TFA:
> ZFS was featured in the keynotes, it was on the developer disc handed out to attendees, and it was even mentioned on the Mac OS X Server website. Apple had been working on its port since 2006 and now it was functional enough to be put on full display.
However, once Oracle bought Sun, the deal was off.
Again from TFA:
> The Apple-ZFS deal was brought for Larry Ellison's approval, the first-born child of the conquered land brought to be blessed by the new king. "I'll tell you about doing business with my best friend Steve Jobs," he apparently said, "I don't do business with my best friend Steve Jobs."
And that was the end.
[1]: https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...
[2]: https://ahl.dtrace.org/2016/06/19/apfs-part1/
[3]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-ana...
>> Apple can currently just take the ZFS CDDL code and incorporate it (like they did with DTrace), but it may be that they wanted a "private license" from Sun (with appropriate technical support and indemnification), and the two entities couldn't come to mutually agreeable terms.
> I cannot disclose details, but that is the essence of it.
* https://archive.is/http://mail.opensolaris.org/pipermail/zfs...
Apple took DTrace, licensed via CDDL—just like ZFS—and put it into the kernel without issue. Of course a file system is much more central to an operating system, so they wanted much more of a CYA for that.
However, I can say, every time I've tried ZFS on my iMac, it was simply a disaster.
Just trying to set it up on a single USB drive, or setting it up to mirror a pair. The net effect was that it CRUSHED the performance on my machine. It became unusable. We're talking "move the mouse, watch the pointer crawl behind" unusable. "Let's type at 300 baud" unusable. Interactive performance was shot.
After I remove it, all is right again.
It depends on the format. A BMP image format would limit the damage to 1 pixel, while a JPEG could propagate the damage to potentially the entire image. There is an example of a bitflip damaging a picture here:
https://arstechnica.com/information-technology/2014/01/bitro...
That single bit flip ruined about half of the image.
As for video, that depends on how far apart I frames are. Any damage from a bit flip would likely be isolated to the section of video from the bitflip until the next I-frame occurs. As for how bad it could be, it depends on how the encoding works.
> On the one hand, potentially, "very" robust.
Only in uncompressed files.
> But on the other, I would think that there are some very special bits that if toggled can potentially "ruin" the entire file. But I don't know.
The way that image compression works means that a single bit flip prior to decompression can affect a great many pixels, as shown at Ars Technica.
> However, I can say, every time I've tried ZFS on my iMac, it was simply a disaster.
Did you file an issue? I am not sure what the current status of the macOS driver’s production readiness is, but it will be difficult to see it improve if people do not report issues that they have.
That's the fault of macOS, I also experienced 100% CPU and load off the charts and it was kernel_task jammed up by USB. Once I used a Thunderbolt enclosure it started to be sane. This experience was the same across multiple non-Apple filesystems as I was trying a bunch to see which one was the best at cross-os compatibility
Also, separately, ZFS says "don't run ZFS on USB". I didn't have problems with it, but I knew I was rolling the dice
Anyway only bringing it up to reinforce that it is probably a macOS problem.
jitl•8h ago
zoky•6h ago
klodolph•6h ago
karlgkk•5h ago
Now, old does not necessarily mean bad, but in this case….
twoodfin•5h ago
The rollout of APFS a decade later validated this concern. There’s just no way that flawless transition happens so rapidly without a filesystem fit to order for Apple’s needs from Day 0.
TheNewsIsHere•5h ago
What you describe hits my ear as more NIH syndrome than technical reality.
Apple’s transition to APFS was managed like you’d manage any kind of mass scale filesystem migration. I can’t imagine they’d have done anything differently if they’d have adopted ZFS.
Which isn’t to say they wouldn’t have modified ZFS.
But with proper driver support and testing it wouldn’t have made much difference whether they wrote their own file system or adopted an existing one. They have done a fantastic job of compartmentalizing and rationalizing their OS and user data partitions and structures. It’s not like every iPhone model has a production run that has different filesystem needs that they’d have to sort out.
There was an interesting talk given at WWDC a few years ago on this. The roll out of APFS came after they’d already tested the filesystem conversion for randomized groups of devices and then eventually every single device that upgraded to one of the point releases prior to iOS 10.3. The way they did this was to basically run the conversion in memory as a logic test against real data. At the end they’d have the super block for the new APFS volume, and on a successful exit they simply discarded it instead of writing it to persistent storage. If it errored it would send a trace back to Apple.
Huge amounts of testing and consistency in OS and user data partitioning and directory structures is a huge part of why that migration worked so flawlessly.
jeroenhd•4h ago
There are probably good reasons for Apple to reinvent ZFS as APFS a decade later, but none of them technical.
I also wouldn't call the rollout of APFS flawless, per se. It's still a terrible fit for (external) hard drives and their own products don't auto convert to APFS in some cases. There was also plenty of breakage when case-sensitivity flipped on people and software, but as far as I can tell Apple just never bothered to address that.
jonhohle•4h ago
kmeisthax•3h ago
I don't know for certain if they could have done it with ZFS; but I can imagine it would at least been doable with some Apple extensions that would only have to exist during test / upgrade time.
[0] Part of why the APFS upgrade was so flawless was that Apple had done a test upgrade in a prior iOS update. They'd run the updater, log any errors, and then revert the upgrade and ship the error log back to Apple for analysis.
hs86•5h ago
This can lead to problems under sudden memory pressure. Because the ARC does not immediately release memory when the system needs it, userland pages might get swapped out instead. This behavior is more noticeable on personal computers, where memory usage patterns are highly dynamic (applications are constantly being started, used, and closed). On servers, where workloads are more static and predictable, the impact is usually less severe.
I do wonder if this is also the case on Solaris or illumos, where there is no intermediate SPL between ZFS and the kernel. If so, I don't think that a hypothetical native integration of ZFS on macOS (or even Linux) would adopt the ARC in its current form.
dizhn•3h ago
ryao•2h ago
pseudalopex•2m ago
Not fast enough always.
netbsdusers•2h ago
ryao•1h ago
fweimer•5h ago
ryao•2h ago
Minor things like the indirect blocks being missing for a regular file only affect that file. Major things like all 3 copies of the MOS (the equivalent to a superblock) being gone for all uberblock entries would require recovery from backup.
If all copies of any other filesystem’s superblock were gone too, that filesystem would be equally irrecoverable and would require restoring from backup.
alwillis•2h ago
ryao•1h ago
https://iosref.com/ram-processor
People have run operating systems using ZFS on less.