It offers similar deduplication, indexing, per-file compression, and versioning advantages
It even supports Unix metadata!
Zpaq is quite mature and also handles deduplication, versioning, etc.
Or eStargz. https://github.com/containerd/stargz-snapshotter
Or Nydus RAFS. https://github.com/dragonflyoss/nydus
Links for your mentioned zpaq and dwarFS https://www.mattmahoney.net/dc/zpaq.html https://github.com/mhx/dwarfs
https://plakar.io/posts/2025-07-07/kapsul-a-tool-to-create-a...
Zstd has been widely available for a long time. Debian, which is pretty conservative with new software, has shipped zstd since at least stretch (released 2017).
- tiny code size; - widely used standard; - fast compression and decompression.
And it also beat Zstandard on compressing TXR Lisp .tlo files by a non-negligible margin. I can reproduce that today:
$ zstd -o compiler.tlo.zstd stdlib/compiler.tlo
stdlib/compiler.tlo : 25.60% (250146 => 64037 bytes, compiler.tlo.zstd)
$ gzip -c > compiler.tlo.gzip stdlib/compiler.tlo
$ ls -l compiler.tlo.*
-rw-rw-r-- 1 kaz kaz 60455 Jul 8 21:17 compiler.tlo.gzip
-rw-rw-r-- 1 kaz kaz 64037 Jul 8 17:43 compiler.tlo.zstd
The .gzip file is 0.944 as large as the .zstd file.So for this use case, gzip is faster (zstd has only decompression that is fast), compresses better and has way smaller code footprint.
That said, the tiny code footprint of gzip can be a real benefit. And you can usually count on gzip being available as a system library on whatever platform you're targeting, while that's often not the case for zstd (on iOS, for example).
Tne Zopfli gzip-compatible compressor gets the file down to 54343. But zstd with level -19 beats that:
-rw-rw-r-- 1 kaz kaz 54373 Jul 8 22:59 compiler.tlo.zopfli
-rw-rw-r-- 1 kaz kaz 50102 Jul 8 17:43 compiler.tlo.zstd.19
I have no idea which is more CPU/memory intensive.For applications in which compression speed is not important (data is being prepared once to be decompressed many times), if you want the best compression and stick with gzip, Zopfli is the ticket.
Not that it matters when the file is so small in the first place... I'm just saying you should be sure what you're 'benchmarking'
Restic has a similar featureset (deduplicated encrypted backups), but almost certainly has better incremental performance for complex use cases like storing X daily backups, Y weekly backups, etc. At the same time, it struggles with RAM usage when handling even 1TB of data, and presumably ptar has better scaling at that size.
There's also rustic, which supposedly is optimized for memory: https://rustic.cli.rs/docs/
UI: In addition to a simple Unix-style CLI, Plakar provides an web interface and API for monitoring, browsing snapshots
Data-agnostic snapshots: Plakar’s Kloset engine captures any structured data—filesystems, databases, applications—not just files, by organizing them into self-describing snapshots
Source/target decoupling: You can back up from one system (e.g. a local filesystem) and restore to another (e.g. an S3 bucket) using pluggable source and target connectors
Universal storage backends: Storage connectors let you persist encrypted, compressed chunks to local filesystems, SFTP servers or S3-compatible object stores (and more)—all via a unified interface
Extreme scale with low RAM: A virtual filesystem with lazy loading and backpressure-aware parallelism keeps memory use minimal, even on very large datasets
Network- and egress-optimized: Advanced client-side deduplication and compression dramatically cut storage and network transfer costs—ideal for inter-cloud or cross-provider migrations
Online maintenance: you don't need to stop you backup to free some space
ptar...
nemothekid•7mo ago
This is a complete aside, but how often are people backing up data to something other than S3? What I mean is it some piece of data is on S3, do people have a contingency for "S3 failing".
S3 is so durable in my mind now that I really only imagine having an "S3 backup" if (1) I had an existing system (e.g. tapes), or (2) I need multi-cloud redundancy. Other than that, once I assume something is in S3, I confident it's safe.
Obviously this was built over years (decades?) or reliability, and if your DRP requires alternatives, you should do them, but is anyone realistically paranoid about S3?
SteveNuts•7mo ago
burnt-resistor•7mo ago
1. Use tarsnap so there's an encryption and a management layer.
2. Use a second service so there's redundancy and no SPoF.
3. Keep cryptographic signatures (not hashes) of each backup job in something like a WORM blockchain KVS.
nemothekid•7mo ago
You guys should really have versioning enabled. Now if someone deleted your data and all the versions, that could be possible, but that would take real effort and would like be malicious.
imglorp•7mo ago
mrflop•7mo ago
tecleandor•7mo ago
What if somebody deletes the file? What if it got corrupted for a problem in one of your processes? What if your API key falls in the wrong hands?
nemothekid•7mo ago
I don't want to suggest that people should place all their eggs in one basket - it's obviously irresponsible. However, S3 (and versioning) has been the "final storage" for years now. I can only imagine a catastrophic situation like an entire s3 region blowing up. And I'm sure a disgruntled employee could do a lot of damage as well.
joshka•7mo ago
The context that this article suggests is that if your S3 bucket is your primary storage, then it's possible that you're not thinking about where the second copy of your data should belong.
nemothekid•7mo ago
S3 with versioning enabled provides this. I'm not being naive when I say S3 really provides everything you might need. Its my observation over the last 13 years, dealing with tons of fires, that there has every been a situation where I couldn't retrieve something from S3.
Legally you might need an alternative. Going multi-cloud doesn't hurt - after all I do it. But practically? I don't think I would lose sleep if someone told me they only back up to S3.
icedchai•7mo ago
charcircuit•7mo ago
icedchai•7mo ago
fpoling•7mo ago
icedchai•7mo ago
fpoling•7mo ago
icedchai•7mo ago
fpoling•7mo ago
If the company does not pay, then the company breaches its contract and Amazon can delete the data. But typically there would be a warning period.
tuckerman•7mo ago
deathanatos•7mo ago
Possible, perhaps, but contrived.
coredog64•7mo ago
tuckerman•7mo ago
Brian_K_White•7mo ago
fpoling•7mo ago
Now, one can argue that courts would take time and money and a company may not afford such risk even if it is theoretical. In this case if data is that important it is stupid to keep them at AWS.
But then just write the data to tapes and store in a bank cell or whatever.
treve•7mo ago
firesteelrain•7mo ago
zzo38computer•7mo ago
kjellsbells•7mo ago
- Employee goes rogue and nukes buckets.
- Code fault quietly deletes data, or doesnt store it like you thought.
- State entity demands access to data, and you'd rather give them a tape than your S3 keys.
I agree that with eleven-nines or whatever it is of availability, a write to S3 is not going to disappoint you, but most data losses are more about policy and personnel than infrastructure failures.
coredog64•7mo ago
foota•7mo ago
toomuchtodo•7mo ago
https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiF...
https://docs.aws.amazon.com/AmazonS3/latest/userguide/object...
xyzzy123•7mo ago
But.. aws backup is still nice, if a bit heavy. I like common workflows to restore all stuff (ddbs, managed dbs, buckets etc) to a common point in time. Also, one of the under-appreciated causes of massive data loss is subtly incorrect lifecycle policies. Backup can save you here even when other techniques may not.
mrflop•7mo ago
Also, AWS Backup locks your snapshots into AWS vaults, whereas Plakar lets you push and pull backups to any backend—local disk, S3, another cloud, on-prem, etc.
xyzzy123•7mo ago
The storage needed for this depends on the data change rate in your application, more or less it works like a WAL in a DB. What is annoying is that you can't really control it (for obvious reasons), and less forgivably, AWS backup is super opaque about how much is actually being used by what.
Retention of dailies / weeklies / monthlies is a different (usually compliance) concern (NOT operational, not really, if you have to restore from a monthly your business is probably already done for) and in an enterprise context you are generally prevented from using deltas for these due to enterprise policy or regulation (yeah I know it sounds crazy, reqs are getting really specific these days).
People on AWS don't generally care that they're locked in to AWS services (else.. they wouldn't be on AWS), and while cost is often a factor it is usually not the primary concern (else.. they would not be on AWS). What often IS a primary concern is knowing that their backup solution is covered under the enterprise tier AWS support they are already paying an absolute buttload for.
Also stuff like Vault lock "compliance mode" & "automated restore testing" are helpful in box-ticking scenarios.
Plakar looks awesome but I'm not sure AWS Backup customers are the right market to go for.
fpoling•7mo ago
mrflop•7mo ago
fpoling•7mo ago
And if your company has a sale contract with AWS the buckets cannot just vanish or AWS cannot close the account at arbitrary moment.
FooBarWidget•7mo ago
hxtk•7mo ago
Of course, since we had the backups, restoration of individual objects would’ve been possible, but we would’ve needed to do it by hand.
Spooky23•7mo ago
jamesfinlayson•7mo ago
The backups themselves were off-limits to regular employees though - only the team that managed AWS could edit or delete the backups.