Effectiveness is debatable, I'd say this approach still has duplication. Absolutely an 'insignificant' amount... in this instance! The filesystem handling this at the block level is probably less problematic or prone to rework.
edit: With some snark for flavor, seeing a now-dead reply. Different filesystems have different properties? You don't say! Choose one that does de-duplication, ignore the hard link limits/skip this endless routine. Two birds, one decision.
edit: This is a nice preparation/consideration, I guess. I still maintain a backup store/filesystem unaware of duplication at the block level is a mistake. If nothing else, it will strengthen this approach and the live data.
Reading more closely, I missed the shipping-of-tarballs. That makes sense, I was imagining 'unpacking', basically. Absolutely would not go as far to suggest their scheme pick up 'zfs {send,receive}, lol.
And I see above that this is a self-hosted platform and I still don’t get it. I was running terabytes of ZFS with dedupe=on on cheap supermicro gear in 2012
Is it just me or is everybody else just as fed up with always the same AI tropes?
I've reached a point where I just close the tab the moment I read a headline "The problem". At least use tropes.fyi please
replooda•56m ago
UltraSane•29m ago