Would be great to have an in kernel alternative to ZFS for parity RAID.
This is not the first project for which this was an issue, and said maintainer has shown no will to alter their behaviour before or since.
The underlying problem might have been importing Bcachefs into the mainline kernel to early in it's life cycle.
A lot of people aren't going to keep up with Linus personal travel plans just so they don't send a late patch.
He refused to acknowledge his place on the totem pole and thought he knew better than everyone else, and that they should change their ways to suit his whims.
I can understand the motivation. It's a PITA to support an older version of code. But that's not how linux gets it's stability.
Commit A: introduce the bug
Commit B: change architecture
Commit C: add a feature
Commit D: fix A using code present in B and C.
The issue ends up being that D needs to be reimplemented to fix A because B and C don't exist on the tip.Since linux has closed windows and long term kernels it means the fix to the same bug could need to be done in multiple ways.
Multiple changes per PR is bad, but I assume it's still one change per commit.
IMHO, it may be more natural, but only during development. Trying to do a git bisect on git histories like the above is a huge pain. Trying to split things up when A is ready but B/C are not is a huge pain.
That claim was to add new logging functionality to allow better troubleshooting to eventually address critical issues.
This should have been out of trunk for someone to test, rather than claiming it to be something that wasn't strictly true. Especially when it's the kernel.
Over the long term the number of cases where such a response is needed will decrease as expected.
Do you really want to live in a world where data losses in stable releases is considered Okay?
Why do they need to be in the kernel anyways? Presumably they are running on an unmounted device?
Maintaining a piece of code that needs to run in both user space and the kernel is messy and time consuming. You end up running into issues where dependencies require the porting of gobs of infrastructure from the kernel into userspace. That's easy for some thing, very hard for others. There's a better place to spend those resource: by stabilizing bcachefs in the kernel where it belongs.
Other people have tried and failed at this before, and I'm sure that someone will try the same thing again in the future and relearn the same lesson. I know as business requirements for a former employer resulted in such a beast. Other people thought they could just run their userspace code in the kernel, but they didn't know about limits on kernel stack size, they didn't know about contexts where blocking vs non-blocking behaviour is required or how that interacted with softirqs. Please, just don't do this or advocate for it.
It's really not, the proper way to recover your important data is to restore from backups, not to force other people to bend longstanding rules for you.
>Do you really want to live in a world where data losses in stable releases is considered Okay?
Bcachefs is an experimental filesystem.
There is no reason to break kernel guidelines to deliver a fix.
If I'm not mistaken Kent pushed recovery routines in the RC to handle some catastrophic bug some user caused by loading the current metadata format into an old 6.12 kernel.
It isn't some sinister "sneaking features". This fact seems to be omitted by clickbaity coverage over the situation.
Rule 1: don't assume malice.
I never understand why some people are unwilling to make any attempt at getting along. Some people seem to feel any level of compromise is too much.
I have a multidevice filesystem, comprised of old HDDs and one sketchy PCI-SATA extension. This FS was assembled in 2019 and, though it went through periods of being non-writable, is still working and I haven't lost any[1] data. This is more than 5 years, multitude of FS version upgrades, multiple device replacements with corresponding data evacuation and rereplication.
[1] Technically, I did lose some, when a dying device started misbehaving and writing garbage, and I was impatient and ran a destructive fsck (with fix_errors) before waiting for a bug patch.
Don't want to compare it to other solutions but this is impressive even on its own merits.
IIRC the whole drama began because Kent was constantly pushing new features along with critical bug fixes after the proper merge window.
I meant stable in the sense where most changes are bug fixes, reducing the friction of working within the kernel schedules.
Yes, me too.
> Would be great to have an in kernel alternative to ZFS
Yes it would.
> for parity RAID.
No.
Think of the Pareto Principle here. 80% of the people only use 20% of the functionality. BUT they don't all use the same 20% so overall you need 80% of the functionality... or more.
ZFS is one of the rivals here.
But Btrfs is another. Stratis is another. HAMMER2 is another. MDRAID is another. LVM is another.
All proviude some or all of that 20% and all have pros and cons.
The point is that, yes, ZFS is good at RAID and it's much much easier than ext4 on MDRAID or something.
Btrfs can do that too.
But ZFS and Btrfs do COW snapshots. Those are important too. OpenSUSE, Garuda Linux, siduction and others depend on Btrfs COW.
OK, fine, no problem, your use case is RAID. I use that too. Good.
But COW is just as important.
Integrity is just as important and Btrfs fails at that. That is why the Bcachefs slogan is "the COW filesystem that won't eat your data."
Btrfs ate my data 2-3 times a year for 4 years.
Doesn't matter how many people who praise it, what matters are the victims who have been burned when it fails. They prove that it does fail.
The point is not "I can do that with ext4 on mdraid" or "I can do that with LVM2" or "Btrfs is fine for me".
The point is something that can do _all of these_ and do it _better_ -- and here, "better" includes "in a simpler way".
Simpler here meaning "simpler to set up" and also "simpler in implementation" (compared to, say, Btrfs on LVM2, or Btrfs on mdraid, or LVM on mdraid, or ext4 on LVM on RAID.
Something that can remove entire layers of the stack and leave the same functionality is valuable.
Something that can remove 90% of the setup steps and leave identical functionality matters... Because different people do those steps in different order, or skip some, and you need to document that, and none of us document stuff enough.
The recovery steps for LVM on RAID are totally different from RAID on LVM. The recovery for Btrfs on mdraid is totally different from just Btrfs RAID.
This is why tools that eliminate this matter. Because when it matters whether you have
1 - 2 - 3 - 4 - 5
or
1 - 2 - 4 - 3 - 5
Then the sword that chops the Gordian knot here is one tool that does 1-5 in a single step.
This remains true even if you only use 1 and 5, or 2 and 3, and it still matters if you only do 4.
> ext4 on MDRAID or something
Are trivially easy to set up, expand, or replace drives; require no upkeep; and no setup when placed into entirely different systems. Anybody using ZFS or ZFS-like to do some trivial standard RAID setup (unless they are used to and comfortable with ZFS, which is an entirely different story) is just begging to lose data. MDADM is fine.
Or people who want data checksums.
> Anybody using ZFS or ZFS-like to do some trivial standard RAID setup (unless they are used to and comfortable with ZFS, which is an entirely different story) is just begging to lose data.
How? You just... hand it some devices, and it makes a pool. Drive replacement is a single command.
> Are trivially easy to set up
Done it. Been doing it for 25+ years.
ZFS is easier. MUCH easier, and much quicker too.
> expand
As easy with ZFS.
> or replace drives;
Easier with XFS.
> require no upkeep;
False. Ext4 requires the occasional check. This must be done offline. ZFS doesn't and can be scrubbed while online and actively in use.
> and no setup when placed into entirely different systems.
Same as ZFS.
> Anybody using ZFS or ZFS-like to do some trivial standard RAID setup (unless they are used to and comfortable with ZFS, which is an entirely different story) is just begging to lose data.
False.
> MDADM is fine.
I am not saying it isn't. I am saying ZFS is better.
I think you haven't tried it, because your claims betray serious ignorance of what it can do.
I built my main NAS box's ZRAID with the drives in USB-3 caddies on a Raspberry Pi 4. I moved it to the built in SATA controllers of an HP Microserver running TrueNAS Core.
Imported and just worked. No reconfig, no rebuild, nothing.
It moves seamlessly between Arm and x86, Linux and FreeBSD, no problem at all. Round trip if you want.
He is BDFL. No, these changes do not belong into this part of our release window. No pull. End of discussion. Instead he always talked and caved and pulled. And of course situation repeated, as they do...
Perhaps as BDFL he let it slip a few too many times, but that's generally the way you want to go - as a leader, you want to trust your subordinates are doing the right thing; which means that you'll get burned a few times until you have to take action (like this).
The only other option makes you into a micromanager, which doesn't scale.
> He is BDFL.
As far as I remember, "B" in "BFDL" stands for "benevolent". This usually might mean give a couple of warnings, give a benefit of doubt, extend some credit, and if that doesn't help, invoke the "D".
https://www.phoronix.com/forums/forum/software/general-linux...
I know more about ZFS than the others. It wasn't specified here whether ZFS had ashift=9 or 12; it tries to auto-detect, but that can go wrong. ashift=9 means ZFS is doing physical I/O in 512 bytes, which will be an emulation mode for the nvme. Maybe it was ashift=12. But you can't tell.
Secondly, ZFS defaults to a record size of 128k. Write a big file and it's written in "chunks" of 128k size. If you then run a random read/write I/O benchmark on it with a 4k block size, ZFS is going to be reading and writing 128k for every 4k of I/O. That's a huge amplification factor. If you're using ZFS for a load which resembles random block I/O, you'll want to tune the recordsize to the app I/O. And ZFS makes this easy, since child filesystem creation is trivially cheap and the recordsize can be tuned per filesystem.
And then there's the stuff things like ZFS does that XFS / EXT4 doesn't. For example, taking snapshots every 5 minutes (they're basically free), doing streaming incremental snapshot backups, snapshot cloning and so on - without getting into RAID flexibility.
On the the configuration stuff, these benchmarks intentionally only ever use the default configuration – they're not interested in the limits of what's possible with the filesystems, just what they do "out of the box", since that's what the overwhelming majority of users will experience.
Substitutable how? Like, I'm typing this on a laptop with a single disk with a single zpool, because I want 1. compression, 2. data checksums, 3. to not break (previous experiments with btrfs ended poorly). Obviously I could run xfs, but then I'd miss important features.
You probably don't want to do that because that'll result in massive metadata overhead, and nothing tells you that the app's I/O operations will be nicely aligned, so this cannot be given as general advice.
has the benchmarks of the dkms module
[0]: https://www.phoronix.com/forums/forum/software/general-linux...
Bcachefs is exciting on paper, but even just playing around there are some things that are just untenable imho. Time has proven that the stability of a project stems from the stability of the teams and culture behind it. As such the numbers don’t lie and unless it can be at parity with existing filesystems I can’t be bothered to forgive the misgivings. I’m looking forward to the day when bcachefs matures… if ever, as it is exciting.
Also if something has changed in the last year I’d love to hear about it! I just haven’t found anything compelling enough yet to risk my time bsing around with it atm.
[1] https://youtube.com/watch?v=_RKSaY4glSc&pp=ygUZTGludXMgZmlsZ...
The dev acted out of line for kernel development, even if _kind_ of understandable (like with the recovery tool), but still in a way that would set a bad precedent for the kernel, so this appears to be good judgement from Linus.
Hope the best for Bcachefs's future
From: Kent Overstreet @ 2025-09-11 23:19 UTC
As many of you are no doubt aware, bcachefs is switching to shipping as
a DKMS module. Once the DKMS packages are in place very little should
change for end users, but we've got some work to do on the distribution
side of things to make sure things go smoothly.
Good news: ...
https://lore.kernel.org/linux-bcachefs/yokpt2d2g2lluyomtqrdv...Doesn't that mean I now have to enroll the MOK key on all my work workstations that use secure boot? If so that's a huge PITA on over 200 machines. As like with the NVIDIA driver you can't automate the facility.
Is this filesystem stable enough for deploying on 200 production machines?
From a cursory look I get things like this:
https://hackaday.com/2025/06/10/the-ongoing-bcachefs-filesys...
Anyway, fair question IMO. Another point I'd like to make... migrating away from this filesystem, disabling secure boot, or leaning into key enrollment would be fine. Dealer's choice.
The 'forced interaction' for enrollment absolutely presents a hurdle. That said: this wouldn't be the first time I've used 'expect' to use the management interface at scale. 200 is a good warm up.
The easy way is to... opt out of secure boot. Get an exception if your compliance program demands it [and tell them about this module, too]. Don't forget your 'Business Continuity/Disaster Recovery' of... everything. Documents, scheduled procedures, tooling, whatever.
Again, though, stability is a fair question/point. Filesystems and storage are cursed. That would be my concern before 'how do I scale', which comparatively, is a dream.
Not going to happen. Secure Boot is a mandatory requirement in this scenario.
I can't talk further because NDA, but sure am confused by the downvotes for asking a question.
I'll hit this post positively in an attempt to counter the down-trend. edit: well, that was for squat.
However, I would like to push back on that article.
It says that bcachefs is "unstable" but provides no evidence to support that.
It says that Linus pushed back on it. Yes, but not for technical reasons but rather process ones. Think about that for a second though. Linus is brutal on technology. And I have never heard him criticize bcachefs technically except to say that case insensitivity is bad. Kind of an endorsement.
Yes, there have been a lot of patches. It is certainly under heavy development. But people are not losing their data. Kent submitted a giant list of changes for the kernel 6.17 merge window (ironically totally on time). Linus never took them. We are all using the 6.16 version of bcachefs without those patches. I imagine stories of bcachefs data loss would get lots of press right now. Have you heard any?
There are very few stories of bcachefs data loss. When I have heard of them, they seems to result in recovery. A couple I have seen were mount failures (not data loss) and were resolved. It has been rock-solid for me.
Meanwhile just scan the thread for btrfs reports...
Where did Linus call bcachefs "experimental garbage"? I've tried finding those comments before, but all I've been able to find are your comments stating that Linus said that
For sure it's a headache when you install some module on a whole bunch of headless boxes at once and then discover you need to roll a crash cart over to each and every one to get them booting again, but the secure boot guys would have it no other way.
I'm not even a bcachefs user, but I use ZFS extensively and I _really_ wanted Linux to get a native, modern COW filesystem that was unencumbered by the crappy corporate baggage that ZFS has.
In the comments on HN around any bcachefs news (including this one) there are always a couple throwaway accounts bleating the same arguments - sounding like the victim - that Kent frequently uses.
To Kent, if you're reading this:
From a long time (and now former) sponsor: if these posts are actually from you, please stop.
Also, it's time for introspection and to think how you could have handled this situation better, to avoid having disappointed those who have sponsored you financially for years. Yes, there are some difficult and flawed people maintaining the kernel, not least of which Linus himself, but you knew that when you started.
I hope bcachefs will have a bright future, but the ball is very clearly in your court. This is your problem to fix.
(I'm Daniel Wilson, subscription started 9th August 2018, last payment 1st Feb 2025)
Seems to tick all of the boxes in regard to what you're looking for, and its mature enough that major linux distros are shipping with it as the default filesystem.
Your statement is misleading. No one is using btrfs on servers. Debian and Ubuntu use ext4 by default. RHEL removed support for btrfs long ago, and it's not coming back:
> Red Hat will not be moving Btrfs to a fully supported feature. It was fully removed in Red Hat Enterprise Linux 8.
https://philip.greenspun.com/blog/2024/02/29/why-is-the-btrf...
> We had a few seconds of power loss the other day. Everything in the house, including a Windows machine using NTFS, came back to life without any issues. A Synology DS720+, however, became a useless brick, claiming to have suffered unrecoverable file system damage while the underlying two hard drives and two SSDs are in perfect condition. It’s two mirrored drives using the Btrfs file system
I am hoping we will get ZFS from Ubnt NAS via update.
First one is that they don't use btrfs own RAID (aka btrfs-raid/volume management). They actually use hardware RAID so they don't experience any of the stability/data integrity issues people experience with btrfs-raid. Ontop of this, facebooks servers run in data centers that have 100% electricity uptime (these places have diesel generators for backup electricity)
Synology likewise offers btrfs on their NAS, but its underneath mdadm (software RAID)
The main benefit that Facebook gets from btrfs is transparent compression and snapshots and thats about it.
So yes, if you are Facebook, and put it on a rock-solid block layer, then it will probably work fine.
But outside of the world of hyperscalers, we don't have rock solid block layers. [1] Consumer drives occasionally do weird things and silently corrupt data. And on top of drives, nobody uses ECC memory and occasionally weird bit flips will corrupt data/metadata before it's even written to the disk.
At this point, I don't even trust btrfs on a single device. But the more disks you add to a btrfs array, the more likely you are to encounter a drive that's a little flaky.
And Btrfs's "best feature" really doesn't help it here, because it encourages users to throw a large number of smaller cheap/old spinning drives at it. Which is just going to increase the chance of btrfs encountering a flaky drive. The people who are willing to spend more money on a matched set of big drives are more likely to choose zfs.
The other paradox is that btrfs ends up in a weird spot where it's good enough to actually detect silent data corruption errors (unlike ext4/xfs and friends where you never find out your data was corrupted), but then it's metadata is complex and large enough that it seems to be extra vulnerable to those issues.
---------------
[1] No, mdadm doesn't count as a rock-solid block layer, it still depends on the drives to report a data error. If there is silent corruption, madam just forwards it. I did look into using a synology style btrfs on mdadm setup, but I searched and found more than a few stories from people who's synology filesystem borked itself.
In fact, you might actually be worse off with btrfs+mdadm, because now data integrity is done at a completely different layer to data redundancy, and they don't talk to each other.
Plus I needed zvols for various applications. I've used ZFS on BSD for even longer so when OpenZFS reached a decent level of maturity the choice between that and btrfs was obvious for me.
It's really difficult to get a real feel for BTRFS when people deliberately omit critical information about their experiences. Certainly I haven't had any problems (unless you count the time it detected some bitrot on a hard drive and I had to restore some files from a backup - obviously this was in "single" mode).
Some of the most catastrophic ones were 3 years ago or earlier, but the latest kernel bug (point 5) was with 6.16.3, ~1 month ago. It did recover, but I already mentally prepared to a night of restores from backups...
I don't understand how btrfs is considered by some people to be stable enough for production use.
Keeping it healthy means paying close attention to "btrfs fi df" and/or "fi usage" for best results.
ZFS also does not react well to running out of space.
[0]I'm currently evaluating OpenSuse as a possible W11 replacement, but not using it for anything serious atm.
I am also frustrated by this whole debacle, I'm not going to stop funding him though Bcachefs is a solid alternative to btrfs. It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
I really wish, though that DKMS was not such a terrible a solution. It _will_ break my boot, because it always breaks my boot. The Linux kernel really needs a stable module API so that out-of-tree modules like bcachefs are not impossible to reliably boot with.
This isn't just a one time thing, speaking as someone who follows the kernel, apparently this has been going on pretty much since bcachefs first tried to get into Linus's tree. Kent even once told another kernel maintainer to "get your head examined" and was rewarded with a temporary ban.
Edit: To be fair, the kernel is infamous for being guarded by stubborn maintainers but really I guess the lesson to be learned here is if you want your pet project to stick around in the kernel you really can't afford to be stubborn yourself.
Amen.
And to your point about it being a "pet project" - I'm sure I could go look at the commit history, but is anyone other than Kent actually contributing meaningfully to bcachefs ? If not, this project sorely needs more than one person involved.
I hold out some hope that somebody else will get involved in bcachefs and that the new person will be able to resubmit bcachefs to the mainline.
My impression is that many people respect the technology and would be happy to have it back--they just cannot work with Kent.
But that is the reasons this will probably not happen. It does not appear that anybody wants to work with Kent.
But bcachefs never lived in userspace even before it was merged
Those generic syscalls are (supposed to) don't change, the internal filesystem calls can and do change.
This is one reason why ZFS regularly breaks, on top of it can't use GPL exports.
Sometimes, they do. For instance, BTRFS_IOC_CLONE to do a copy-on-write clone of a file's contents (now promoted to other filesystems as FICLONE, but many other ioctl operation codes are still btrfs-specific; and other filesystems have their own filesystem-specific operations).
DMKS is not going to work for me though. Some of the distros I use do not even support it. Chimera Linux uses ckms.
As for how we got here, It was not just one event. Kent repeatedly ignored the merge window, submitting changes too late. This irked Linux. When Linus complained, Kent attacked him. And Kent constantly ran down the LKML and kernel devs by name (especially the btrfs guys). This burned a lot of bridges. When Linus pushed Kent out, many people rushed to make it permanent instead of rushing to his defense. Kent lost my support in the final weeks by constantly shouting that he was a champion for his users while doing the exact opposite of what I wanted him to do. I want bcachefs in the kernel. Kent worked very hard to get it pushed out.
It really is a great file system though.
No, I was not "ignoring the merge window". Linus was trying to make and dictate calls on what is and is not a critical bugfix, and with a filesystem eating bug we needed to respond to, that was an unacceptable situation.
edit: by the time i commented it was already dark text. guess it recovered.
It seems like bcachefs would benefit from parallel development spirals, where unproven code is guarded by experimental flags and recommended only for users who are prepared to monitor and apply patches outside of the main kernel release cycle, while the default configuration of the mainline version goes through a more thorough test and validation process and more rarely sees "surprise" breakage.
It certainly appears that Linus can't tell the difference between your new feature introductions and your maintenance fixes, and that should trigger some self-reflection. He clearly couldn't let all of the thousands of different kernel components operate in the style you seem to prefer for bacachefs.
Maybe if you're a distance observer who isn't otherwise participating in the project
What's the saying about too many cooks in the kitchen?
These concerns aren't coming from the people who are actually using it. They're coming from people who are accustomed to the old ways of doing things and have no idea what working closely with modern test infrastructure is like.
There wasn't anything unusual about the out-of-merge-window patches I was sending Linus except for volume, which is exactly what you'd expect in a rapidly stabilizing filesystem. If anything I've been more conservative about what I send that other subsystems.
> It certainly appears that Linus can't tell the difference between your new feature introductions and your maintenance fixes, and that should trigger some self-reflection. He clearly couldn't let all of the thousands of different kernel components operate in the style you seem to prefer for bacachefs.
If Linus can't figure out which subsystems have QA problems and which don't, that's his problem, not mine.
Linus's job is not to make the very next version of the kernel as good as it can be. It's to keep the whole system of Linux kernel maintenance going. (Maintaining the quality of the next version is almost a side-effect.) Asking him to make frequent exceptions to process is the equivalent of going "this filesystem is making poor decisions: let's hex-edit /dev/sda to allocate the blocks better". Your priority is making the next version of bcachefs as good as it can be, and you're confident that merging your patchsets won't break the kernel, but that's entirely irrelevant.
> If Linus can't figure out which subsystems have QA problems and which don't, that's his problem, not mine.
You have missed the point by a mile.
Citation needed.
> Linus's job is not to make the very next version of the kernel as good as it can be. It's to keep the whole system of Linux kernel maintenance going. (Maintaining the quality of the next version is almost a side-effect.) Asking him to make frequent exceptions to process is the equivalent of going "this filesystem is making poor decisions
You're arguing from a false premise here. No exceptions were needed or required, bcachefs was being singled out because he and the other maintainers involved had no real interest in the end goal of getting a stable, reliable, trustworthy modern filesystem.
The discussions, public and private - just like you're doing here - always managed to veer away from engineering concerns; people were more concerned with politics, "playing the game", and - I'm not joking here - undermining me as maintainer; demanding for someone else to take over.
Basic stuff like QA procedure and how we prioritize patches never entered into it, even as I repeatedly laid that stuff out.
> > If Linus can't figure out which subsystems have QA problems and which don't, that's his problem, not mine.
> You have missed the point by a mile.
No, that is very much the point here. bcachefs has always had one of the better track records at avoiding regressions and quickly handling them when they do get through, and was being singled out as if something was going horribly awry anyways. That needs an explanation, but one was never given.
Look, from the way you've been arguing things - have you been getting your background from youtube commentators? You have a pretty one sided take, and you're pushing that point of view really hard when talking to the person who's actually been in the middle of it for the past two years.
Maybe you should reevaluate that.
> citation needed
I'm sure you're familiar with "separation of concerns" in programming: it's the same principle. My experience of bureaucracies is experience, but I'm sure most good books on the topic will have a paragraph or chapter on this.
> bcachefs was being singled out because he and the other maintainers involved had no real interest in the end goal of getting a stable, reliable, trustworthy modern filesystem.
I imagine they would dispute that.
> No exceptions were needed or required,
I know that Linus Torvalds would dispute that. In this HN thread and elsewhere, you've made good arguments that treating your approach as an exception is not warranted, and that your approach is better, but you surely aren't claiming that your approach is the usual approach for kernel development?
> undermining me as maintainer
Part of the job is dealing with Linus Torvalds. You're not good at that. It would make sense for you to focus on architecture, programming, making bcachefs great, and to let someone else deal with submitting patches to the kernel, or arguing those technical arguments where you're right, but aren't getting listened to.
People are "more concerned with politics" than with engineering concerns because the problem is not with your engineering.
> bcachefs has always had one of the better track records at avoiding regressions and quickly handling them when they do get through
That's not relevant. I know that you don't see that it's not relevant: that's why I'm saying it's not relevant.
> have you been getting your background from youtube commentators?
No, but I'm used to disputes of this nature, and I'm used to dealing with unreasonable people. You believe that others are being unreasonable, but you're not following an effective strategy for dealing with unreasonable people. I am attempting to give advice, because I want bcachefs in the kernel, and I haven't a hope of changing Linus Torvalds' mind, but I have half a hope of changing yours.
Rule one of dealing with unreasonable people is to pick your battles. How many times have I said "dispute" or "argument" in this comment? How many times have these been worthy disputes? Even if I've completely mischaracterised the overall situation, surely you can't claim that every argument you've had on the LKML or in LWN comments has been worth bcachefs being removed from the kernel.
I can't ship and support a filesystem under those circumstances.
The "support" angle is one you and a lot of other people are missing. Supporting it is critical to stabilizing, and we can't close the loop with users if we can't ship bugfixes.
Given the past history with btrfs, this is of primary concern.
You've been looking for a compromise position, and that's understandable, but sometimes the only reasonable compromise is "less stupidity, or we go our separate ways".
The breaking point was, in the private maintainer thread, a page and a half rant from Linus on how he doesn't trust my judgement, and then immediately after that another rant on how - supposedly - everyone in the kernel community hates me and wants me gone.
Not joking. You can go over everything I've ever said on LKML, including the CoC incident, and nothing rises to that level. I really just want to be done with that sort of toxicity.
I know you and a lot of other people wanted bcachefs to be upstream, and it's not an ideal situation, but there are limits to what I'm willing to put up with.
Now, we just need to be looking forward. The DKMS transition has been going smoothly, more people are getting involved, everyone's fully committed to making this work and I still have my funding.
It's going to be ok, we don't need to have everything riding on making it work with the kernel community.
And now, you guys don't have to worry about me burning out on the kernel community and losing my last vestiges of sanity and going off to live in the mountains and herd goats :)
That's literally his job?
Linus has never once found a mistake in the bcachefs pull requests, but he has created massive headaches for getting bugfixes out to users that were waiting on them.
Honestly I don't think it's just you, I'm the same boat. I find it hard to believe that the entirely predictable outcome of the current situation is what more than 1% of users wanted.
It was explicitly marked experimental.
Reiserfs sat in the kernel for years after he went to prison and didn't get removed on such short notice even though it was equally if not more unmaintained.
I don't think the kernel devs share that definition.
> Reiserfs sat in the kernel for years after he went to prison and didn't get removed on such short notice even though it was equally if not more unmaintained.
Reiserfs wasn't marked experimental.
Going forward, I'll agree with you: mainline does not care about users of experimental features. But it's disingenuous to suggest that this has always been the expectation.
From what I can tell, what happened is that was just a trigger for a clash of personalities between Linus and Kent, both of whom have a bit of a temper, and refused to back down, which escalated to this.
It was marked experimental. No promises were made.
> It's not at all clear to me what really happened to make all the drama. A PR was made that contained something that was more feature-like than bugfix-like, and that resulted in a whole module being ejected from the kernel?
If you followed the whole saga, Kent had a history of being technically excellent but absolutely impossible to work with.
He did not follow the kernel development process over and over again, instead choosing to post long rants and personal attacks to the mailing lists. Instead of accepting that he has to adapt, he always had (still has) the attitude that he knows best, everybody else are idiots and should just accept his better judgement. Of course the rules should apply… to anyone but him, because his work is special.
Moreover, this was not a one PR thing. Linus actually had a long history of defending Kent, being patient, bending the rules, and generally putting up with him.
This is unambiguously 100% Kent’s fault, which is very sad, because — as mentioned before — Kent’s work is technically excellent. He had a lot of good will that he squandered in spectacular fashion.
Doesn't btrfs fit that description? I know there are some problems with it, but it is definitely a native COW filesystem, abd AFAIK it is "modern".
Btrfs also has issues with large numbers of snapshots, you have to cull them occasionally or things begin to slow down, bcachefs does not.
Btrfs not good?
(Honest question.)
I've had a few issues, but no data loss:
* Early versions of btrfs had an issue where you'd run out of metadata space (if I recall). You had to rebalance and sometimes add some temporary space do that.
* One of my filesystems wasn't optimally aligned because btrfs didn't do that automatically (or something like that -- this was a long time ago.) A very very minor issue.
* Corruption (but no data loss, so I'm not sure it's corruption per se...) during a device replacement.
This last one caused no data loss, but a lot of error messages. I started a logical device removal, removed the device physically, rebooted, and then accidentally readded the physical device while it was still removing it logically. It was not happy. I physically removed the device again, finished the logical remove, and did a scrub and the fsck equivalent. No errors.
I think that's a testament to its resiliency, but also a testament how you can shoot yourself in the foot.
I've never used RAID5/6 on btrfs and don't plan to -- partly because of the scary words around it, but I also assume the rebuild time is longer.
Seemingly regardless of the drives, interface, or kernel, other filesystems paired with LVM or mdraid fail/recover/lie more gracefully. NVMe or SATA (spindles). Demonstrated back-to-back with replacements from different batches.
Truly disheartening, I want BTRFS. I would like to dedicate some time to this, but, well, time remains of the essence. I'm hoping it's something boring like my luck with boards/storage controllers, /shrug.
It's reproducible, the scope needs to be reduced. With work. A lot of testing and variable change/reduction. More than I care for.
The problem: R&R, work/money, etc, all compete for a limited amount of time. I'll spend it how I like, Square? Comments win over rigorous testing with my schedule, thanks.
Why don't you try to reproduce it? Better things to do, this isn't the mailing list? Exactly. Pick a reason, there's plenty.
It feels more user friendly than ZFS, but ZFS is much more feature complete. I used to use btrfs for all my personal stuff, but honestly ext4 is just easier.
And every time something like this comes up, I end up with every sort of accusation pointed at me, and no one seems to be willing to look at the wider picture - why is the kernel community still unable to figure out a plan to get a trustworthy modern filesystem?
> This is your problem to fix.
No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Be realistic :) If the community wants this to happen, the community will have to step up.
Please please don't forget I want you to succeed - that's why I bunged nearly $800 your way in this endeavour - but I'm not the only person who thinks you come across as completely immune to critisicm, even when it's constructive and from your supporters.
>> This is your problem to fix.
> No, I've said from the start that this needs to be a community effort. A filesystem is too big for one person.
Right now that "community effort" is looking a bit unlikely, eh ?
I would hate to have to deal with these people as my primary occupation and I totally get why you don't want to continue.
That's said, nobody else has the power, skill or inclination to make bcachefs that wonderful filesystem of the future for Linux - only you. That's what I meant by "this is your problem to fix".
I wish you the best of luck with the new DKMS direction. And I'll get on board and actually try it out soon :D
You have to understand, I get an absolute _ton_ of "constructive criticism" from people who basically are assuming everything is going off the rails and are expecting massive, and unrealistic changes; for bandwidth reasons if nothing else. I have to stay focused on the code and getting it done.
> Right now that "community effort" is looking a bit unlikely, eh ?
Actually, the big surprise from the DKMS switch is just how much the community came together to make it happen.
This project has looked like a one man show for a long time, and the core of it probably always will be for the simple reason that there are precious few people with the skillset required to do core filesystem engineering; that requires a massive amount of dedication and investment of time to get good at, there's a hell of a learning curve.
But there's still a lot of people with the willingness and ability to help out in other areas. People have been since even before the DKMS switch, to be honest, it's just that a lot of it is boring invisible but extremely necessary QA work - and that stuff is work, and people have helped out a lot there.
You have to be involved in the community, in the IRC channel to see this stuff going on. It's really not just me.
And now with the DKMS switch, a lot more people jumped in and started helping, and that's how we were able to get every major distro supported before the 6.17 release. That happened _fast_, and only a small fraction of the work was mine, mostly I was just coordinating.
Honestly, looking back, I don't think I could have planned this better - the timing was perfect. We're nearly done with stabilization, so it was the right time to start focusing more on distro integration and building up those working relationships, and the DKMS migration was just the kick in the pants to make that happen. Now we're pretty well positioned to get bcachefs into distro installers perhaps six months out.
The community is real, and it's growing.
(I would still _fucking love_ to have more actual filesystem engineers though, heh).
And now ... BCACHEFS (like ZFS) is as well externalized, likely. --> I'm just shocked. --> And I still hope that all goes Best and BCACHEFS keeps staying in kernel.
patrakov•4mo ago
kouteiheika•4mo ago
ThatPlayer•4mo ago
For now
alphabetag675•4mo ago
matja•4mo ago
AFAIK that change didn't add functionality or fix any existing issues, other than breaking ZFS - which GKH was absolutely fine with, dismissing several requests for it to be reverted, stating the "policy": [1]
> Sorry, no, we do not keep symbols exported for no in-kernel users.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux... [1] https://lore.kernel.org/lkml/20190111054058.GA27966@kroah.co...
mnau•4mo ago
> Sun explicitly did not want their code to work on Linux, so why would we do extra work to get their code to work properly?
Why would you accommodate someone who explicitly went out of their way to not accommodate you?
It took many conflicts with bcachefs developer to reach this state. Olive branch has been extended again and again...
matja•4mo ago
mnau•4mo ago
I am not kernel developer, but less exposed API/functions is nearly always better.
The removed comment of function even starts with: Careful: __kernel_fpu_begin/end() must be called with
nubinetwork•4mo ago
capitol_•4mo ago
(Hard and tedious work, but not impossible).
SXX•4mo ago
bluGill•4mo ago
So of course they won't, but it isn't impossible.
habitue•4mo ago
bluGill•4mo ago
ggiesen•4mo ago
ggiesen•4mo ago
wbl•4mo ago
ggiesen•4mo ago
I agree that based on that source, it's more like "meh, we don't really care" (until they do)
Macha•4mo ago
Conan_Kudo•4mo ago
Sanzig•4mo ago
The modern OpenZFS project is not part of Oracle, it's a community fork from the last open source version. OpenZFS is what people think of when they say ZFS, it's the version with support for Linux (contributed in large part by work done at Lawrence Livermore).
The OpenZFS project still has to continue using the CDDL license that Sun originally used. The opinion of the Linux team is the CDDL is not GPL compatible, which is what prevents it from being mainlined in Linux (it should be noted not everyone shares this view, but obviously nobody wants to test it in court).
It's very frustrating when people ascribe malice to the OpenZFS team for having an incompatible license. I am sure they would happily change it to something GPL compatible if they could, but their hands are tied: since it's a derivative work of Sun's ZFS, the only one with the power to do that is Oracle, and good luck getting them to agree to that when they're still selling closed source ZFS for enterprise.
chasil•4mo ago
Making /home into a btrfs filesystem would be an opening salvo.
IBM now controls Oracle's premier OS. That is leverage.
sho_hn•4mo ago
chasil•4mo ago
timeon•4mo ago
remix2000•4mo ago
I'm just sorry for the guy and perhaps a little bit sorry for myself that I might have to reformat my primary box at some point…
Also unrelated, but Sun was a very open source friendly company with a wide portfolio of programs licensed under GNU licenses, without some of which Linux would still be useless to the general public.
Overall, designing a good filesystem is very hard, so perhaps don't bite the hand that feeds you…?
simlevesque•4mo ago
The maintainer kept pushing new features at a time when only bugfix are allowed. He also acted like a child when he got asked to follow procedures. Feel sorry for his bad listening and communication abilities.
jacobgkau•4mo ago
The "new features" were recovery features for people hit by bugs. I can see where the ambiguity came from.
p_l•4mo ago
mnau•4mo ago
p_l•4mo ago
CDDL was a compromise choice that was seen as workable for inclusion based especially on certain older views on what code will be compatible or not, and it was unclear and possibly expected that Linux kernel will move to GPLv3 (when it finally releases) which was seen as compatible with CDDL by CDDL drafters.
Alas, Solaris source release could not wait unclear amount of time for GPLv3 to be finalized
mnau•4mo ago
> it was unclear and possibly expected that Linux kernel will move to GPLv3
In what world? Kernel was always GPLv2 without the "or later" clause. Kernel had would tens of thousands of contributors. Linus made it quite obvious by that time kernel will not move to GPLv2 (even in 2006).
Even if I gave them benefit of the doubt, GPLv3 was released in 2007. They had years to make license change and didn't. They were sold to Oracle in 2010.
rleigh•4mo ago
The CDDL is actually very permissive. You can combine it with anything, including proprietary licences.
swinglock•4mo ago
darthcloud•4mo ago
[0] https://github.com/openzfs/zfs/issues/8259
[1] https://github.com/openzfs/zfs/pull/8965
AndrewDavis•4mo ago
Is moving a symbol from EXPORT_SYMBOL(some_fun) to EXPORT_SYMBOL_GPL(some_func) actually changing the API? Nope, the API is exactly the same as it was before, it's just changed who is allowed to use it.
From the perspective of an out of tree module that isn't GPL you have removed stuff.
I'm honestly not sure how one outside the kernel community could construe that as not removing something.
happymellon•4mo ago
No, it was always designed to be hostile to Linux from the outset. It's a project that doesn't want to interoperability with Linux so I'm not entirely sure why you think the Linux folks should maintain an API for them.
Sanzig•4mo ago
Since the pre-fork code is from Sun, Oracle owns the copyright, and they won't re-license it.
The idea that the OpenZFS team wants CDDL out of spite for Linux is an absurd conspiracy theory. Their hands are tied - I'm sure they'd move to a compatible license if they could, but they can't.
p_l•4mo ago
So the OpenZFS team is not exactly interested in moving to GPLv2, because it would break multiple platforms.
Sanzig•4mo ago
But it's an academic exercise anyway, since it seems Oracle has no intention of allowing them to relicense.
p_l•4mo ago
The choice of creating a new license was because of two reasons:
- Internally people wanted for the code to be usable by not just Linux and Solaris (lots of BSD fans, for example)
- Sun was insisting on mutual patent protection clauses because GPLv2 didn't support them, and GPLv3 was not yet available to discuss viability at all.
toast0•4mo ago
Linux and OpenZFS are pretty much locked into their licenses, regardless of what people might want today. There are too many contributors to Linux to relicense, and while OpenZFS has fewer, I don't think there's any reason to think Oracle would relicense, given they went back to closed source with Solaris and ZFS on Solaris.
> It's a project that doesn't want to interoperability with Linux.
Regardless of the original intent of Sun in picking the license, it's hard to imagine a project called ZFS on Linux (which was merged into OpenZFS) doesn't want to interoperate with Linux.
AndrewDavis•4mo ago
> They could always move to a compatible license? > No, it was always designed to be hostile to Linux from the outset. It's a project that doesn't want to interoperability with Linux
I'm not sure why you've jumped here. I didn't mention a specific project or licence.
But, nonetheless I'm going to assume you mean OpenZFS.
1. No they can't change the license. Much like Linux contributors retain their own copyright, OpenZFS can't just change the license. The only group that could hypothetically change it is Oracle given the clause the steward of the license can release a new version, but that's unlikely and Oracle has absolute nothing to do with the existing project.
2. Staying on the license and compatibility. It's really quite confusing on what's compatible in the eyes of Linux. The very fact they have separate export and export GPL symbols suggest Linux as a project sanctions non GPL modules, and considers them compatible if they only use those symbols, perhaps in the same vein as they consider the syscall boundary to be the compatible with non GPL? If someone who is actually in the know about why there are two sets of exports if love to be know.
3. Always designed to be hostile to Linux. Whether that's true or not is the debateable, there are conflicting opinions from those who worked at Sun at the time. Also the comment criticises a community that had no hand in whether or not it was intended to be hostile to Linux or not. In the end is copyleft software, very similar in spirit to the Mozilla public license. And by definition copyleft licenses are inherently incompatible without specific get out of the jail clauses to combine them (see MPLv2 for example).
4. Re interoperability. Strongly disagree. OpenZFS takes great strides to be compatible with Linux. Each release a developer sends hours pouring over Linux changes and updating a compat layer, and the module remains compatible to compile against multiple Linux versions at any one time, there are even compat patches to detect distro specific backports that while the version hasn't changed the distro have back ported things that change behaviour. That's a serious commitment to interoperability. And a large number of openzfs devs do their work against Linux as their primary platform, hence why the FreeBSD rebased their ZFS upstream on ZFS on Linux, leading it to become the official upstream OpenZFS. I can't see how anyone could say in good faith they don't care about Linux compatibility unless they haven't looked over at the openzfs project for over a decade.
4. Re why do I think Linux folks should maintain APIs for them.
The way you worded this strongly implies I was saying Linux should maintain an API for them. In no way did I say that. I was replying to a post that was adament that Linux doesn't remove things. I provided a perspective that Linux does in fact remove things. I wasn't arguing for maintaining any API, Linux doesn't even guarantee internal APIs for themselves. I was pointing out changing a symbol export from export to everyone to export GPL only isn't changing it, given it's the exact same API they've just simply removed it for some groups.
None the less I think it'd be great if Linux could maintain some APIs for out of tree modules. But they don't and that's fine. I just find changing exports from open for everyone tomorrow GPL only to be rather hostile.
Really, no one in either of these communities had any say in their license (sans Torvalds). Both creating great stuff for we as users to run. And it'd be great if people working on free software could get along, and those in the peanut gallery didn't prescribe malcontent between them because of a difference in license they didn't pick.
aragilar•4mo ago
arghwhat•4mo ago
Changes would therefore need to be an improvement for in-tree drivers, and not merely something for an out-of-tree driver.
dev_l1x_be•4mo ago
masklinn•4mo ago
odo1242•4mo ago
charcircuit•4mo ago
sc68cal•4mo ago
rurban•4mo ago
cwillu•4mo ago
charcircuit•4mo ago
bmicraft•4mo ago
charcircuit•4mo ago
cwillu•4mo ago
bcrl•4mo ago
Bugs are a fact of life. Bug fixes are a fact of life. Sometimes those bugs will cause data loss. Adding code in and -rc to support data recovery when a bug caused data loss to occur is a good thing for users. Portraying it as a bad thing is the worst kind of bike shedding.
cwillu•4mo ago
If you're arguing that bcachefs is incompatible with the practiced kernel development discipline, well, I guess linus agrees.