I'm consulting for a company that makes around €1 billion annual turnover. They don't make their own backups. They rely on disk copies made by the datacenter operator, which happen randomly, and which they don't test themselves.
Recently a user error caused the production database to be destroyed. The most recent "backup" was four days old. Then we had to replay all transactions that happened during those four days. It's insane.
But the most insane part was, nobody was shocked or terrified about the incident. "Business as usual" it seems.
"Oh there goes Super Entrepise DB Partner again" turns into a product next fiscal year, that shutdowns the following year because the scope was too big, but at least they tried to make things better.
company where i worked, had something similar. i spent a couple of months going through all teams, figuring out how disaster recovery policies are implemented (all of them were approved soc auditors).
outcome of my analysis was that in case of major disasters it will be easier to shut down company and go home than trying to recover to working state within reasonable amount of time.
A full OS installation may not change a lot, or change with security updates that anyway are stored elsewhere.
Configurations have their own lifecycle, actors, and good practices on how to keep and backup them. Same with code.
Data is what matters if you have saved somewhat everything else. And it could have a different treatment file tree backups from I.e. database backups.
Logs is something that frequently changes, but you can have a proper log server for which logs are data.
Things can be this granular, or go for storage backup. But the granularity, while may add complexity, may lower costs and increase how much of what matters you can store for longer periods of time.
* Is the file userland-compressed, filesystem-or-device-compressed, or uncompressed?
* What are you going to do about secret keys?
* Is the file immutable, replace-only (most files), append-only (not limited to logs; beware the need to defrag these), or fully mutable (rare - mostly databases or dangerous archive software)?
* Can you rely on page size for (some) chunking, or do you need to rely entirely on content-based chunking?
* How exactly are you going to garbage-collect the data from no-longer-active backups?
* Does your filesystem expose an accurate "this file changed" signal, or better an actual hash? Does it support chunk sharing? Do you know how those APIs work?
* Are you crossing a kernel version that is one-way incompatible?
* Do you have control of the raw filesystem at the other side? (e.g. the most efficient backup for btrfs is only possible with this)
In my opinion a good backup (system) is only good, if it has been tested to be restorable as fast as possible and the procedure is clear (like in documented).
How often have I heard or seen backups that "work great" and "oh, no problem we have them" only to see them fail or take ages to restore, when the disaster has happened (2 days can be an expensive amount of time in a production environment). Quite too often only parts could be restored.
Another missing aspect is within the snapshots section... I like restic, which provides repository based backup with deduplicated snapshots for FILES (not filesystems). It's pretty much what you want if you don't have ZFS (or other reliable snapshot based filesystems) to keep different versions of your files that have been deleted on the filesystem.
The last aspect is partly mentioned, the better PULL than PUSH part. Ransomware is really clever these days and if you PUSH your backups, it can also encrypt or delete all your backups, because it has access to everything... So you could either use readonly media (like Blurays) or PULL is mandatory. It is also helpful to have auto-snapshotting on ZFS via zfs-auto-snapshot, zrepl or sanoid to go back in time to where the ransomware has started its journey.
Or like someone already commented you can use a server that allows push but doesn't allow to mess with older files. You can for example restrict ssh to only the scp command and the ssh server can moreover offer a chroot'ed environment to which scp shall copy the backups. And the server can for example daily rotate that chroot.
The push can then push one thing: daily backups. It cannot log in. It cannot overwrite older backups.
Short of a serious SSH exploit where the ransomware could both re-configure the server to accept all ssh (and not just scp) and escape the chroot box, the ransomware is simply not destroying data from before the ransomware found its way on the system.
My backup procedure does that for the one backup server that I have on a dedicated server: a chroot'ed ssh server that only accepts scp and nothing else. It's of course just one part of the backup procedure, not the only thing I rely on for backups.
P.S: it's not incompatible with also using read-only media
On the face of it "append-only access (no changes)" seems sound to me
I did not see a likely reason in a quick review of their comment history.
You can view a comment directly by following the "... ago" link, and from there you can use the "vouch" link to revive the comment. I vouched for a few of TacticalCoder's recent comments.
That depends on how you have access to your backup servers configured. I'm comfortable with append-only backup enforcement for push backups[0] with Borg and Restic via SSH, although I do use offline backup drive rotation as a last line of defense for my local backup set. YMMV.
0 - https://marcusb.org/posts/2024/07/ransomware-resistant-backu...
That depends on your goal, right? If it took me six months to recover my family photo backups, that'd be fine by me.
Curious what you consider valuable data?
Edit: I should say for pictues I have around 2Tb right now (downside of being a hobby photographer)
I have a large amount of memories and "mathom" as well, in double copies, but I connect and add to this data so rarely that it absolutely does not have to be part of any ongoing backup plan.
My preferred solution is to let client only write new backups, never delete. The deletion is handled separately (manually or cron on the target).
You can do this with rsync/ssh via the allowed command feature in .ssh/authorized_keys.
pacman -S arch-install-scripts # Need this package (for debian you need debootstrap)
pacstrap -c /mnt/backups/TestSpawn base # Makes chroot
systemd-nspawn -D /mnt/backups/TestSpawn # Logs in
passwd # Set the root password. Do whatever else you need then exit
sudo ln -s /mnt/backups/TestSpawn /var/lib/machines/TestSpawn
sudo machinectl start TestSpawn # Congrats, you can now control with machinectl
Configs work like normal systemd stuff. So you can limit access controls, restrict file paths, make the service boot only at certain times or activate based on listening to a port, make only accessible via 192.168.1.0/24 (or 100.64.0.0/10), limit memory/CPU usage, or whatever you want. (I also like to use BTRFS subvolumes) You could also go systemd-vmspawn for a full VM if you really wanted to.Extra nice, you can use importctl to then replicate.
I wish for syncoid to add this feature. I want it to only copy snapshots to the backup server. The server then deletes old snapshots. At the moment it requires delete permissions.
You might think this is unsuitable for your photo/music/etc. collection, but there's no technical reason you couldn't use the database as the primary storage mechanism. SQLite will take you to ~281 terabytes with a 64k page size. MSSQL supports something crazy like 500 petabytes. The blob data types will choke on your 8k avengers rip, but you could store it in 1 gig chunks - There are probably other benefits to this anyways.
Almost like GMail Drive back in the day but worse.
I've been working on backup and disaster recovery software for 10 years. There's a common phrase in our realm that I feel obligated to share, given the nature of this article.
> "Friends don't let friends build their own Backup and Disaster Recovery (BCDR) solution"
Building BCDR is notoriously difficult and has many gotchas. The author hinted at some of them, but maybe let me try to drive some of them home.
- Backup is not disaster recovery: In case of a disaster, you want to be up and running near-instantly. If you cannot get back up and running in a few minutes/hours, your customers will lose your trust and your business will hurt. Being able to restore a system (file server, database, domain controller) with minimal data loss (<1 hr) is vital for the survival of many businesses. See Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- Point-in-time backups (crash consistent vs application consistent): A proper backup system should support point-in-time backups. An "rsync copy" of a file system is not a point-in-time backup (unless the system is offline), because the system changes constantly. A point-in-time backup is a backup in which each block/file/.. maps to the same exact timestamp. We typically differentiate between "crash consistent backups" which are similar to pulling the plug on a running computer, and "application consistent backups", which involves asking all important applications to persist their state to disk and freeze operations while the backup is happening. Application consistent backups (which is provided by Microsoft's VSS, as mentioned by the author) significantly reduce the chances of corruption. You should never trust an "rsync copy" or even crash consistent backups.
- Murphy's law is really true for storage media: My parents put their backups on external hard drives, and all of r/DataHoarder seems to buy only 12T HDDs and put them in a RAID0. In my experience, hard drives of all kinds fail all the time (though NVMe SSD > other SSD > HDD), so having backups in multiple places (3-2-1 backup!) is important.
(I have more stuff I wanted to write down, but it's late and the kids will be up early.)
Re: BCDR solutions, they also sell trust among B2B companies. Collectively, these solutions protect billions, if not trillions of dollars worth of data, and no CTO in their right mind would ever allow an open-source approach to backup and recovery. This is primarily also due to the fact that backups need to be highly available. Scrolling through a snapshot list is one of the most tedious tasks I've had to do as a sysadmin. Although most of these solutions are bloated and violate userspace like nobody's business, it is ultimately the company's reputation that allows them to sell products. Although I respect Proxmox's attempt at cornering the Broadcom fallout, I could go at length about why it may not be able to permeate the B2B market, but it boils down to a simple formula (not educational, but rather from years of field experience):
> A company's IT spend grows linearly with valuation up to a threshold, then increases exponentially between a certain range, grows polynomially as the company invests in vendor-neutral and anti-lock-in strategies, though this growth may taper as thoughtful, cost-optimized spending measures are introduced.
- Ransomware Protection: Immutability and WORM (Write Once Read Many) backups are critical components of snapshot-based backup strategies. In my experience, legal issues have arisen from non-compliance in government IT systems. While "ransomware" is often used as a buzzword by BCDR vendors to drive sales, true immutability depends on the resiliency and availability of the data across multiple locations. This is where the 3-2-1 backup strategy truly proves its value.
Would like to hear your thoughts on more backup principles!
Yeah and for the vast majority of individual cybernauts, that "1" is almost unachievable without paying for a backup service. And at that point, why are you doing any of it yourself instead of just running their rolling backup + snapshot app?
There isn't a person in the world who lives in a different city from me (that "1" isn't protection when there's a tornado or flood or wildfire) that I'd ask to run a computer 24/7 and do maintenance on it when it breaks down.
What does this have to do with security? You shouldn't be backing up data in a way that's visible to the server. Use something like restic. Do not rely on the provider having good security.
Database dumps help with this, to a large extent, especially if the application itself is making the dumps at an appropriate time. But often you have to make the dump outside the application, meaning you could hit it in the middle of a sequence of queries.
Curious if anyone has useful tips for dealing with this.
But for the most part as especially in the cloud, this shouldn't be an issue.
pg_dump / mysqldump both solve the problem of snapshotting your live database safely, but can introduce some bloat / overhead you may have to deal with somehow. All pretty well documented and understood though.
For larger postgresql databases I've sometimes adopted the other common pattern of a read-only replica dedicated for backups: you pause replication, run the dump against that backup instance (where you're less concerned about how long that takes, and what cruft it leaves behind that'll need subsequent vacuuming) and then bring replication back.
Those terms are handy for anyone not familiar with the space to go do some further googling.
Also odd to not note the distinction between backups and archives - at least in terms, of what users' expectations are around the two terms / features - or even mention archiving.
(How fast can I get back to the most recent fully-functional state, vs how can I recover a file I was working on last Tuesday but deleted last Wednesday.)
> without mentioning RPO, RTO, or even RCO
> Those terms are handy for anyone not familiar with the space to go do some further googling.
You should probably get people started RPO: Recovery Point Objective
RTO: Recovery Time Objective
RCo: Recovery Consistency
I'm pretty sure they aren't mentioned because these aren't really necessary for doing self-hosted backups. Do we really care much about how fast we recover files? Probably not. At least not more than that they exist and we can restore them. For a business, yeah, recovery time is critical as that's dollars lost.FWIW, I didn't know these terms until you mentioned them, so I'm not an expert. Please correct me if I'm misunderstanding or being foolishly naive (very likely considering the previous statement). But as I'm only in charge of personal backups, should I really care about this stuff? My priorities are that I have backups and that I can restore. A long running rsync is really not a big issue. At least not for me.
https://francois-encrenaz.net/what-is-cloud-backup-rto-rpo-r...
Knowing the jargon for a space makes it easier to find more topical information. Searching on those abbreviations would be sufficient, anyway.
TFA talks about the right questions to consider when planning backups (but not archives) - eg 'What downtime can I tolerate in case of data loss?' (that's your RTO, effectively).
I'd argue the concepts encapsulated in those TLAs - even if they sound a bit enterprisey - are important for planning your backups, with 'self-hosted' not being an exception per se, just having different numbers.
Sure, as you say 'Do we really care about how fast we recover files?' - perhaps you don't need things back in an hour, but you do have an opinion about how long that should take, don't you?
You also ask 'should I really care about this stuff?'
I can't answer that for you, other than turn it back to 'What losses are you happy to tolerate, and what costs / effort are you willing to incur to mitigate?'. (That'll give you a rough intersection of two lines on your graph.)
This pithy aphorism exists for a good reason : )
> There are two types of people: those who have lost data,
> and those who do backups.
For my archlinux setup, configuration and backup strategy: https://github.com/gchamon/archlinux-system-config
For the backup system, I've cooked an automation layer on top of borg: https://github.com/gchamon/borg-automated-backups
I can't share it. But if you contemplate such a thing, it is possible, and the result is extremely low cost. Borg is pretty awesome.
Backend storage for each Artifactory instance is Dell Isilon.
rr808•7h ago
bambax•7h ago
xandrius•7h ago
For the phones and cameras, setup Nextcloud and have it automatically sync to your own home network. Then have a nightly backup to another disk with a health check after it finishes.
After that you can pick either a cloud host which your trust or get another drive of ours into someone else's server to have another locstion for your 2nd backup and you're golden.
sandreas•7h ago
I would also distinguish between documents (like PDF and TIFF) and photos - there is also paperless ngx.
setopt•6h ago
sandreas•6h ago
https://mobiussync.com/
baby_souffle•5h ago
bravesoul2•6h ago
palata•6h ago
rsolva•6h ago
nor-and-or-not•6h ago
bravesoul2•6h ago
For me one win/mac with backblaze. Dump everything to that machine. Second ext. Drive backup just in case.
haiku2077•6h ago
BirdieNZ•4h ago
You can also store photos/scans on desktops in the same NAS and make sure Immich is picking them up (and then the backup script will catch them if they get imported to Immich). For an HN user it's pretty straight-forward to set up.
Jedd•4h ago
As bambax noted, you do in fact need a backup system -- you just don't realise that yet.
And you want a way of sharing data between devices. Without knowing what you've explored, and constraints imposed by your vendors of choice, it's hard to be prescriptive.
FWIW I use syncthing on gnu/linux, microsoft windows, android, in a mesh arrangement, for several collections of stuff, anchored back to two dedicated archive targets (small memory / large storage debian VMs) running at two different sites, and then perform regular snapshots on those using borgbackup. This gives me backups and archives. My RPO is 24h but could easily be reduced to whatever figure I want.
I believe this method won't work if Apple phones / tablets are involved, as you are not allowed to run background tasks (for syncthing) on your devices.
(I have ~500GB of photos, and several 10-200GB collections of docs and miscellaneous files, as unique repositories - none of these experience massive changes, it's mostly incremental differences, so it is pretty frugal with diff-based backup systems.)
ethan_smith•3h ago
mhuffman•1h ago
I have used pCloud for years with no issue.
Also external "slow" storage drives are fairly inexpensive now as a third backup if your whole life's images and important documents are at stake.
Always best to keep multiple copies of photos or documents that you care about in multiple places. Houses can flood or burn, computers and storage can fail. No need to be over-paranoid about it, but two copies of important things isn't asking too much of someone.