Make Your Own Backup System – Part 1: Strategy Before Scripts

https://it-notes.dragas.net/2025/07/18/make-your-own-backup-system-part-1-strategy-before-scripts/

364•Bogdanp•6mo ago

Comments

rr808•6mo ago

I dont need a backup system. I just need a standardized way to keep 25 years of photos for a family of 4 with their own phones, cameras, downloads, scans etc. I still haven't found anything good.

bambax•6mo ago

You do need a backup. But before that, you need a family NAS. There are plenty of options. (But a NAS is not a backup.)

xandrius•6mo ago

Downloads and scans are generally trash unless deemed important.

For the phones and cameras, setup Nextcloud and have it automatically sync to your own home network. Then have a nightly backup to another disk with a health check after it finishes.

After that you can pick either a cloud host which your trust or get another drive of ours into someone else's server to have another locstion for your 2nd backup and you're golden.

sandreas•6mo ago

I use syncthing... it's great for that purpose, Android is not officially supported but there is a fork, that works fine. Maybe you want to combine it with either ente.io or immich (also available for self-hosted) for photo backup.

I would also distinguish between documents (like PDF and TIFF) and photos - there is also paperless ngx.

setopt•6mo ago

I like Syncthing but it’s not a great option on iOS.

sandreas•6mo ago

What about Möbius Sync?

https://mobiussync.com/

baby_souffle•6mo ago

It's an option... But still beholden to the arbitrary restriction apple has on data access.

msh•6mo ago

Synctrain is better, and free

setopt•6mo ago

Now that’s interesting, thanks for sharing. Link: https://apps.apple.com/no/app/synctrain/id6553985316

I’ve tried Möbius before but wasn’t happy with it. Ended up switching to Resilio Sync (proprietary, one-time purchase) some years ago; the iOS client was better, it synced faster, and it was better at working around NAT/FW issues.

But I recently tried Syncthing again, and it seems to have mostly catches up except for the iOS client.

bravesoul2•6mo ago

Isn't that like a Dropbox approach? If you have 2tb photos this means you need 2tb storage on everything?

palata•6mo ago

I recently found that Nextcloud is good enough to "collect" the photos from my family onto my NAS. And my NAS makes encrypted backups to a cloud using restic.

rsolva•6mo ago

Check out ente.io - it is really good!

nor-and-or-not•6mo ago

I second that, and you can even self-host it.

bravesoul2•6mo ago

Struggling too.

For me one win/mac with backblaze. Dump everything to that machine. Second ext. Drive backup just in case.

haiku2077•6mo ago

A NAS running Immich, maybe?

senectus1•6mo ago

yup. works a treat for me.

still need to back it up though as a NAS/RAID isnt backup.

BirdieNZ•6mo ago

I'm trialing a NAS with Immich, and then backing up the media and Immich DB dump daily to AWS S3 Deep Archive. It has Android and iOS apps, and enough of the feature set of Google Photos to keep me happy.

You can also store photos/scans on desktops in the same NAS and make sure Immich is picking them up (and then the backup script will catch them if they get imported to Immich). For an HN user it's pretty straight-forward to set up.

Jedd•6mo ago

Is '25 years of photos' a North American measure of data I was previously unfamiliar with?

As bambax noted, you do in fact need a backup system -- you just don't realise that yet.

And you want a way of sharing data between devices. Without knowing what you've explored, and constraints imposed by your vendors of choice, it's hard to be prescriptive.

FWIW I use syncthing on gnu/linux, microsoft windows, android, in a mesh arrangement, for several collections of stuff, anchored back to two dedicated archive targets (small memory / large storage debian VMs) running at two different sites, and then perform regular snapshots on those using borgbackup. This gives me backups and archives. My RPO is 24h but could easily be reduced to whatever figure I want.

I believe this method won't work if Apple phones / tablets are involved, as you are not allowed to run background tasks (for syncthing) on your devices.

(I have ~500GB of photos, and several 10-200GB collections of docs and miscellaneous files, as unique repositories - none of these experience massive changes, it's mostly incremental differences, so it is pretty frugal with diff-based backup systems.)

ethan_smith•6mo ago

PhotoPrism or Immich are solid self-hosted options that handle deduplication and provide good search/tagging for family photos. For cloud, Backblaze B2 + Cryptomator can give you encrypted storage at ~$1/TB/month with DIY scripts for uploads.

hyperpl•6mo ago

Isn't B2 $6/TB/mo?

mhuffman•6mo ago

>I still haven't found anything good.

I have used pCloud for years with no issue.

Also external "slow" storage drives are fairly inexpensive now as a third backup if your whole life's images and important documents are at stake.

Always best to keep multiple copies of photos or documents that you care about in multiple places. Houses can flood or burn, computers and storage can fail. No need to be over-paranoid about it, but two copies of important things isn't asking too much of someone.

rmadriz•6mo ago

Others have already mentioned Syncthing[^1]. Here's what I'm doing on a budget since I don't have a homeserver/NAS or anything like that.

First you need to choose a central device where you're going to send all of the important stuff from other devices like smartphones, laptops, etc. Then you need to setup Syncthing, which works on linux, macos, windows and others. For android there's Syncthing-fork[^2] but for iOS idk.

Setup the folders you want to backup on each device, for android, the folders I recommend to backup are DCIM, documents, downloads. For the most part, everything you care about will be there. But I setup a few others like Android/media/WhatsApp/Media to save all photos shared on chats.

Then on this central device that's receiving everything from others, that's where you do the "real" backups. I my case, I'm doing backups to a external HDD, and also to a cloud provider with restic[^3].

I highly recommend restic, genuinely great software for backups. It is incremental (like BTRFS snapshots), has backends for a bunch of providers, including any S3 compatible storage and if combined with rclone, you have access to virtually any provider. It is encrypted, and because of how it was built, can you still search/navigate your remote snapshots without having to download the entire snapshot (borg[^4] also does this), the most important aspect of this is that you can restore individual folders/files. And this crucial because most providers for cloud storage will charge you more depending on how much bandwidth you have used. I have already needed to restore files and folders from my remote backups in multiple occasions and it works beautifully.

[^1]: https://github.com/syncthing/syncthing [^2]: https://github.com/Catfriend1/syncthing-android [^3]: https://github.com/restic/restic [^4]: https://github.com/borgbackup/borg

__turbobrew__•6mo ago

Apple devices and a family iCloud storage plan.

bambax•6mo ago

It's endlessly surprising how people don't care / don't think about backups. And not just individuals! Large companies too.

I'm consulting for a company that makes around €1 billion annual turnover. They don't make their own backups. They rely on disk copies made by the datacenter operator, which happen randomly, and which they don't test themselves.

Recently a user error caused the production database to be destroyed. The most recent "backup" was four days old. Then we had to replay all transactions that happened during those four days. It's insane.

But the most insane part was, nobody was shocked or terrified about the incident. "Business as usual" it seems.

polishdude20•6mo ago

If it doesn't affect your bottom line enough to do it right, then I guess it's ok?

rapfaria•6mo ago

I'd go even a step further: For the big corp, having a point of failure that lives outside its structure can be a feature, and not a bug.

"Oh there goes Super Entrepise DB Partner again" turns into a product next fiscal year, that shutdowns the following year because the scope was too big, but at least they tried to make things better.

justsomehnguy•6mo ago

RTO/RPO is a thing. Despite many companies declare waht they need SLA of five nines and RPO in minutes... this situations are quite evident what many of them are fine with SLA of 95% SLA and PTO of weeks

treetalker•6mo ago

Possibly for legal purposes? Litigation holds are a PITA and generators of additional liability exposure, and backups can come back to bite you.

haiku2077•6mo ago

Companies that big have legal requirements to keep much of their data around for 5-7 years anyway.

daneel_w•6mo ago

It's also endlessly surprising how people over-think the process and requirements.

tguvot•6mo ago

this is side effect of soc2 auditor approved disaster recovery policies.

company where i worked, had something similar. i spent a couple of months going through all teams, figuring out how disaster recovery policies are implemented (all of them were approved soc auditors).

outcome of my analysis was that in case of major disasters it will be easier to shut down company and go home than trying to recover to working state within reasonable amount of time.

truetraveller•6mo ago

Wait, the prod db, like the whole thing? Losing 4 days of data? How does that work. Aren't customers upset? Not doubting your account, but maybe you missed something, because for a $1 billion company, that's likely going to have huge consequences.

bambax•6mo ago

Well it was "a" production database, the one that tracks supplier orders and invoices so that suppliers can eventually get paid. The database is populated by a data stream, so after restoration of the old version, they replayed the data stream (that is indeed stored somewhere, but in only one version (not a backup)).

And this was far from painless: the system was unavailable for a whole day, and all manual interventions on the system (like comments, corrections, etc.) that had been done between the restoration date and the incident, were irretrievably lost. -- There were not too many of those apparently, but still.

truetraveller•6mo ago

Okay, that makes sense. So they had a base backup (which was 4 days old), and 4 days worth of WAL log had to be replayed.

bobsmooth•6mo ago

I just pay $60 a year to backblaze.

somehnguy•6mo ago

Do you mean $99?

bobsmooth•6mo ago

I might be grandfathered on the old price, not sure.

somehnguy•6mo ago

I would be surprised. From the announcements I can find they don't mention any permanent grandfathering. When your plan renews the price increases - and their last increase was 2 years ago.

bobsmooth•6mo ago

I'm paying $100 now. Eh, it's worth the peace of mind. Plus they upgraded everyone to 1 year retention which is nice.

philjohn•6mo ago

Backblaze is great, but restores can be a bit time consuming, even on a fast FTTP connection.

I do have BackBlaze on my desktop, but I also have UrBackup running on all the computers in the house which backs up to a RaidZ2 array, and then a daily offsite backup of the "current" backup (which is just the files stored in a directory in UrBackup) via restic and rclone to JottaCloud.

VM's and containers backup to Proxmox Backup Server and the main datastore of that is also shipped offsite every day, as well as a second Proxmox Backup Server locally (but separate from the rack).

I test restores monthly and so far so good.

gmuslera•6mo ago

How data changes, and what changes it, matters when trying to optimize backups.

A full OS installation may not change a lot, or change with security updates that anyway are stored elsewhere.

Configurations have their own lifecycle, actors, and good practices on how to keep and backup them. Same with code.

Data is what matters if you have saved somewhat everything else. And it could have a different treatment file tree backups from I.e. database backups.

Logs is something that frequently changes, but you can have a proper log server for which logs are data.

Things can be this granular, or go for storage backup. But the granularity, while may add complexity, may lower costs and increase how much of what matters you can store for longer periods of time.

o11c•6mo ago

Other things that matter (some overlap):

* Is the file userland-compressed, filesystem-or-device-compressed, or uncompressed?

* What are you going to do about secret keys?

* Is the file immutable, replace-only (most files), append-only (not limited to logs; beware the need to defrag these), or fully mutable (rare - mostly databases or dangerous archive software)?

* Can you rely on page size for (some) chunking, or do you need to rely entirely on content-based chunking?

* How exactly are you going to garbage-collect the data from no-longer-active backups?

* Does your filesystem expose an accurate "this file changed" signal, or better an actual hash? Does it support chunk sharing? Do you know how those APIs work?

* Are you crossing a kernel version that is one-way incompatible?

* Do you have control of the raw filesystem at the other side? (e.g. the most efficient backup for btrfs is only possible with this)

sandreas•6mo ago

Nice writeup... Although I'm missing a few points...

In my opinion a good backup (system) is only good, if it has been tested to be restorable as fast as possible and the procedure is clear (like in documented).

How often have I heard or seen backups that "work great" and "oh, no problem we have them" only to see them fail or take ages to restore, when the disaster has happened (2 days can be an expensive amount of time in a production environment). Quite too often only parts could be restored.

Another missing aspect is within the snapshots section... I like restic, which provides repository based backup with deduplicated snapshots for FILES (not filesystems). It's pretty much what you want if you don't have ZFS (or other reliable snapshot based filesystems) to keep different versions of your files that have been deleted on the filesystem.

The last aspect is partly mentioned, the better PULL than PUSH part. Ransomware is really clever these days and if you PUSH your backups, it can also encrypt or delete all your backups, because it has access to everything... So you could either use readonly media (like Blurays) or PULL is mandatory. It is also helpful to have auto-snapshotting on ZFS via zfs-auto-snapshot, zrepl or sanoid to go back in time to where the ransomware has started its journey.

sgc•6mo ago

Since you mentioned restic, is there something wrong with using restic append-only with occasional on-server pruning instead of pulling? I thought this was the recommended way of avoiding ransomware problems using restic.

sandreas•6mo ago

There are several methods... there is also restic rest-server (https://github.com/restic/rest-server). I personally use ZFS with pull via ssh...

TacticalCoder•6mo ago

> So you could either use readonly media (like Blurays) or PULL is mandatory.

Or like someone already commented you can use a server that allows push but doesn't allow to mess with older files. You can for example restrict ssh to only the scp command and the ssh server can moreover offer a chroot'ed environment to which scp shall copy the backups. And the server can for example daily rotate that chroot.

The push can then push one thing: daily backups. It cannot log in. It cannot overwrite older backups.

Short of a serious SSH exploit where the ransomware could both re-configure the server to accept all ssh (and not just scp) and escape the chroot box, the ransomware is simply not destroying data from before the ransomware found its way on the system.

My backup procedure does that for the one backup server that I have on a dedicated server: a chroot'ed ssh server that only accepts scp and nothing else. It's of course just one part of the backup procedure, not the only thing I rely on for backups.

P.S: it's not incompatible with also using read-only media

anonymars•6mo ago

I don't understand why this is dead..is it wrong advice? Is there some hidden flaw? Is it simply because the content is repeated elsewhere?

On the face of it "append-only access (no changes)" seems sound to me

quesera•6mo ago

TacticalCoder's comments appear to be auto-deaded for the last week or so.

I did not see a likely reason in a quick review of their comment history.

You can view a comment directly by following the "... ago" link, and from there you can use the "vouch" link to revive the comment. I vouched for a few of TacticalCoder's recent comments.

immibis•6mo ago

Pull-only mode is about reducing that chance of SSH exploits even further.

marcusb•6mo ago

> Ransomware is really clever these days and if you PUSH your backups, it can also encrypt or delete all your backups, because it has access to everything

That depends on how you have access to your backup servers configured. I'm comfortable with append-only backup enforcement for push backups[0] with Borg and Restic via SSH, although I do use offline backup drive rotation as a last line of defense for my local backup set. YMMV.

0 - https://marcusb.org/posts/2024/07/ransomware-resistant-backu...

guillem_lefait•6mo ago

Could you elaborate on your strategy to rotate your disks ?

marcusb•6mo ago

It's pretty simple: the backup host has the backup disk attached via a usb cradle. There's a file in the root directory of the backup disk file system that gets touched when the drive is rotated. A cron jobs emails me if this file is more than 3 months old. When I rotate the disk, I format the new disk and recreate the restic repos for the remote hosts. I then move the old disk into a fireproof safe. I keep four drives in rotation, so at any given point in time I have the online drive plus three with progressively older backup sets in the safe.

guillem_lefait•6mo ago

And then, after a year what do you do with the oldest hard drive ? Does it enter the cycle again, do you destruct it or do you use it in a failsafe environnement ? The procedure looks OK and I would like to make it more organised myself, just trying to find the right balance.

marcusb•6mo ago

The drive enters the cycle again. I use the drives until they show signs of failure (SMART monitoring/testing), or until I need to upgrade for capacity reasons.

I'm using "recertified" (really, used) drives that I've written about here: https://marcusb.org/posts/2024/03/used-hard-drives-from-tech.... They are inexpensive and, so far, have been very reliable. (And, yes, I've done restores from the backup sets.)

guillem_lefait•6mo ago

Thanks for the reference, it makes sense.

KPGv2•6mo ago

> tested to be restorable as fast as possible

That depends on your goal, right? If it took me six months to recover my family photo backups, that'd be fine by me.

daneel_w•6mo ago

My valuable data is less than 100 MiB. I just tar+compress+encrypt a few select directories/files twice a week and keep a couple of months of rotation. No incremental hassle necessary. I store copies at home and I store copies outside of home. It's a no-frills setup that costs nothing, is just a few lines of *sh script, takes care of itself, and never really needed any maintenance.

mavilia•6mo ago

This comment made me rethink what I have that is actually valuable data. My photos alone even if culled down to just my favorites would probably be at least a few gigs. Contacts from my phone would be small. Other than that I guess I wouldn't be devastated if I lost anything else. Probably should put my recovery keys somewhere safer but honestly the accounts most important to me don't have recovery keys.

Curious what you consider valuable data?

Edit: I should say for pictues I have around 2Tb right now (downside of being a hobby photographer)

daneel_w•6mo ago

With valuable I should've elaborated that it's my set of constantly changing daily-use data. Keychain, documents and notes, e-mail, bookmarks, active software projects, those kinds of things.

I have a large amount of memories and "mathom" as well, in double copies, but I connect and add to this data so rarely that it absolutely does not have to be part of any ongoing backup plan.

mystifyingpoi•6mo ago

With photos, it is kinda different story. If I lost 50% of my last vacation photos, I would probably not even notice when scrolling through them. It makes me very nostalgic for analog cameras, where my parents would have to think strategically, how to use 30 or so slots on the analog film for 7 day trip.

rossant•6mo ago

If you die suddenly tomorrow, what would you want your family to recover? What would you want your grandchildren to have access to in a few decades? That's your valuable data. They may not need or want to inherit from hundreds of thousands of files. Chances are that a few key photos, videos, and text would be enough.

daneel_w•6mo ago

I do have a contingency plan for all my digitial memories and my online accounts etc.

progbits•6mo ago

> One way is to ensure that machines that must be backed up via "push" [..] can only access their own space. More importantly, the backup server, for security reasons, should maintain its own filesystem snapshots for a certain period. In this way, even in the worst-case scenario (workload compromised -> connection to backup server -> deletion of backups to demand a ransom), the backup server has its own snapshots

My preferred solution is to let client only write new backups, never delete. The deletion is handled separately (manually or cron on the target).

You can do this with rsync/ssh via the allowed command feature in .ssh/authorized_keys.

haiku2077•6mo ago

This is also why I use rclone copy instead of rclone sync for my backups, using API keys without permission to delete objects.

3eb7988a1663•6mo ago

I fall into the "pull" camp so this is less of a worry. The server to be backed-up should have no permissions to the backup server. If an attacker can root your live server (with more code/services to exploit), they do not automatically also gain access to the backup system.

amelius•6mo ago

I also implemented my backup scheme using "pull" as it is easier to do than an append-only system, and therefore probably more secure as there is less room for mistakes. The backup server can only be accessed through a console directly, which is a bit annoying sometimes, but at least it writes summaries back to the network.

bobek•6mo ago

It is not particularly hard either. Checkout restic server.

https://github.com/restic/rest-server/

setopt•6mo ago

I used to do this too, using rsnapshot to backup ssh:// locations.

At some point I switched to instead using Syncthing to sync all my files to the backup server, and then the backup server did local backups from the Syncthing folder to the backup disk using Borgmatic. Works better for laptops.

Now I daily drive a Mac, and switched to Arq backup with a BackBlaze remote, instead of hosting my own backup server. More of a turnkey solution but works fine, especially given all the settings to suspend backups depending on e.g. battery status, WiFi connectivity, etc. when roaming around.

godelski•6mo ago

Another thing you can do is just run a container or a specific backup user. Something like with a systemd-nspawn can give you a pretty lightweight chroot "jail" and you can ensure that anyone inside that jail can't do any rm commands.

  pacman -S  arch-install-scripts            # Need this package (for debian you need debootstrap)
  pacstrap -c /mnt/backups/TestSpawn base    # Makes chroot
  systemd-nspawn -D /mnt/backups/TestSpawn   # Logs in 
  passwd                                     # Set the root password. Do whatever else you need then exit
  sudo ln -s /mnt/backups/TestSpawn /var/lib/machines/TestSpawn
  sudo machinectl start TestSpawn            # Congrats, you can now control with machinectl

Configs work like normal systemd stuff. So you can limit access controls, restrict file paths, make the service boot only at certain times or activate based on listening to a port, make only accessible via 192.168.1.0/24 (or 100.64.0.0/10), limit memory/CPU usage, or whatever you want. (I also like to use BTRFS subvolumes) You could also go systemd-vmspawn for a full VM if you really wanted to.

Extra nice, you can use importctl to then replicate.

zeec123•6mo ago

> My preferred solution is to let client only write new backups, never delete.

I wish for syncoid to add this feature. I want it to only copy snapshots to the backup server. The server then deletes old snapshots. At the moment it requires delete permissions.

KAMSPioneer•6mo ago

You can do this by using a dedicated syncoid user and ZFS delegated permissions: https://openzfs.github.io/openzfs-docs/man/master/8/zfs-allo...

You'll need to add the --no-elevate-permissions flag to your syncoid job.

dspillett•6mo ago

I do both. It requires two backup locations, but I want that anyway. My backup sources push to an intermediate location and the primary backups pull from there. The intermediate location is smaller so can hold less, but does still keep snapshots.

This means that neither my backup sources nor the main backup sinks need to authenticate with each other, in fact I make sure that they can't, they can only authenticate with the intermediate and it can't authenticate with them⁰. If any one or two of the three parts is compromised there is a chance that the third will be safe. Backing up the credentials for all this is handled separately to make sure I'm not storing the keys to the entire kingdom on any internet connectable hosts. The few bits of data that I have that are truly massively important are backed up with extra measures (including an actual offline backup) on top.

With this separation, verifying backups requires extra steps too. The main backups occasionally verify checksums of the data that hold, and send a copy of the hashes for the latest backup back to the intermediate host(s) where that can read back to compare to hashes generated¹ at the sources² in order to detect certain families of corruption issues.

--------

[0] I think of the arrangement as a soft-offline backup, because like an offline backup nothing on the sources can (directly) corrupt the backup snapshots at the other end.

[1] These are generated at backup time, to reduce false alerts from files modified soon after they are read for sending to the backups.

[2] The hashes are sent to the intermediate, so the comparison could be done there and in fact I should probably do that as it'll make sending alerts when something seems wrong more reliable, but that isn't how I initially set things up and I've not done any major renovations in ages.

TMWNN•6mo ago

>My preferred solution is to let client only write new backups, never delete. The deletion is handled separately (manually or cron on the target).

I do this by making backups visible to users on a `/.snapshots` directory that is the same as the target of the backup script, but mounted NFS read-only.

bob1029•6mo ago

I think the cleanest, most compelling backup strategies are those employed by RDBMS products. [A]sync log replication is really powerful at taking any arbitrary domain and making sure it exists in the other sites exactly.

You might think this is unsuitable for your photo/music/etc. collection, but there's no technical reason you couldn't use the database as the primary storage mechanism. SQLite will take you to ~281 terabytes with a 64k page size. MSSQL supports something crazy like 500 petabytes. The blob data types will choke on your 8k avengers rip, but you could store it in 1 gig chunks - There are probably other benefits to this anyways.

rs186•6mo ago

It works in theory, but usability is almost non-existent with this approach unless someone creates an app that interacts with this database and provides file system-like access to users. Any normal human would be better off with Dropbox or Google Drive.

Almost like GMail Drive back in the day but worse.

https://en.m.wikipedia.org/wiki/GMail_Drive

kernc•6mo ago

Make your own backup system—is exactly what I did. I felt git porcelain had a stable-enough API to accommodate this popular use case.

https://kernc.github.io/myba/

binwiederhier•6mo ago

Thank you for sharing. A curious read. I am looking forward to the next post.

I've been working on backup and disaster recovery software for 10 years. There's a common phrase in our realm that I feel obligated to share, given the nature of this article.

> "Friends don't let friends build their own Backup and Disaster Recovery (BCDR) solution"

Building BCDR is notoriously difficult and has many gotchas. The author hinted at some of them, but maybe let me try to drive some of them home.

- Backup is not disaster recovery: In case of a disaster, you want to be up and running near-instantly. If you cannot get back up and running in a few minutes/hours, your customers will lose your trust and your business will hurt. Being able to restore a system (file server, database, domain controller) with minimal data loss (<1 hr) is vital for the survival of many businesses. See Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

- Point-in-time backups (crash consistent vs application consistent): A proper backup system should support point-in-time backups. An "rsync copy" of a file system is not a point-in-time backup (unless the system is offline), because the system changes constantly. A point-in-time backup is a backup in which each block/file/.. maps to the same exact timestamp. We typically differentiate between "crash consistent backups" which are similar to pulling the plug on a running computer, and "application consistent backups", which involves asking all important applications to persist their state to disk and freeze operations while the backup is happening. Application consistent backups (which is provided by Microsoft's VSS, as mentioned by the author) significantly reduce the chances of corruption. You should never trust an "rsync copy" or even crash consistent backups.

- Murphy's law is really true for storage media: My parents put their backups on external hard drives, and all of r/DataHoarder seems to buy only 12T HDDs and put them in a RAID0. In my experience, hard drives of all kinds fail all the time (though NVMe SSD > other SSD > HDD), so having backups in multiple places (3-2-1 backup!) is important.

(I have more stuff I wanted to write down, but it's late and the kids will be up early.)

sebmellen•6mo ago

Also if you have a NAS, don’t use the same hard drive type for both.

poonenemity•6mo ago

Ha. That quote made me chuckle; it reminded me of a performance by the band Alice in Chains, where a similar quote appeared.

Re: BCDR solutions, they also sell trust among B2B companies. Collectively, these solutions protect billions, if not trillions of dollars worth of data, and no CTO in their right mind would ever allow an open-source approach to backup and recovery. This is primarily also due to the fact that backups need to be highly available. Scrolling through a snapshot list is one of the most tedious tasks I've had to do as a sysadmin. Although most of these solutions are bloated and violate userspace like nobody's business, it is ultimately the company's reputation that allows them to sell products. Although I respect Proxmox's attempt at cornering the Broadcom fallout, I could go at length about why it may not be able to permeate the B2B market, but it boils down to a simple formula (not educational, but rather from years of field experience):

> A company's IT spend grows linearly with valuation up to a threshold, then increases exponentially between a certain range, grows polynomially as the company invests in vendor-neutral and anti-lock-in strategies, though this growth may taper as thoughtful, cost-optimized spending measures are introduced.

- Ransomware Protection: Immutability and WORM (Write Once Read Many) backups are critical components of snapshot-based backup strategies. In my experience, legal issues have arisen from non-compliance in government IT systems. While "ransomware" is often used as a buzzword by BCDR vendors to drive sales, true immutability depends on the resiliency and availability of the data across multiple locations. This is where the 3-2-1 backup strategy truly proves its value.

Would like to hear your thoughts on more backup principles!

koolba•6mo ago

> An "rsync copy" of a file system is not a point-in-time backup (unless the system is offline), because the system changes constantly. A point-in-time backup is a backup in which each block/file/.. maps to the same exact timestamp.

You can do this with some extra steps in between. Specifically you need a snapshotting file system like zfs. You run the rsync on the snapshot to get an atomic view of the file system.

Of course if you’re using zfs, you might just want to export the actual snapshot at that point.

sudobash1•6mo ago

Unless you are doing more steps, that is still just a crash consistent backup. Better than plain rsync, but still not ideal.

KPGv2•6mo ago

> having backups in multiple places (3-2-1 backup!) is important

Yeah and for the vast majority of individual cybernauts, that "1" is almost unachievable without paying for a backup service. And at that point, why are you doing any of it yourself instead of just running their rolling backup + snapshot app?

There isn't a person in the world who lives in a different city from me (that "1" isn't protection when there's a tornado or flood or wildfire) that I'd ask to run a computer 24/7 and do maintenance on it when it breaks down.

danlitt•6mo ago

My solution for this has been to leave a machine running in the office (in order to back up my home machine). It doesn't really need to be on 24/7, it's enough to turn it on every few days just to pull the last few backups.

justsomehnguy•6mo ago

If you aren't at CERN level of data - you can always rent a VPS/dedicated server for this.

It's a matter of the value of your data. Or how much it would cost you to lose it.

Spivak•6mo ago

> You should never trust an "rsync copy" or even crash consistent backups.

This leads you to the secret forbidden knowledge that you only need to back up your database(s) and file/object storage. Everything else can be, or has to be depending on how strong that 'never' is, recreated from your provisioning tools. All those Veeam VM backups some IT folks hoard like dragons are worthless.

kijin•6mo ago

Exactly. There is no longer any point in backing up an entire "server" or a "disk". Servers and disks are created and destroyed automatically these days. It's the database that matters, and each type of database has its own tooling for creating "application consistent backups".

mekster•6mo ago

For regular DB like MySQL/PostgreSQL, just snapshot on zfs without thinking.

binwiederhier•6mo ago

Databases these days are pretty resilient to restoring from crash consistent backups like that, so yes, you'll likely be fine. It's a good enough approach for many cases. But you can't be sure that it really recovers.

However, ZFS snapshots alone are not a good enough backup if you don't off-site them somewhere else. A server/backplane/storage controller could die or corrupt your entire zpool, or the place could burn down. Lots of ways to fail. You gotta at least zfs send the snapshots somewhere.

mekster•6mo ago

How do you mean can’t be sure if it recovers? It’s not hoping for inconsistent states to be recovered by the db but they’re supposed to be in good state with file system snapshotting.

https://serverfault.com/a/806305

https://zrepl.github.io/v0.2.1/configuration/snapshotting.ht...

binwiederhier•6mo ago

Ha! I did not expect a reference to `innodb_flush_log_at_trx_commit` here. I wrote a blog post a few years ago about MySQL lossless semi-sync replication [1] and I've had quite enough of innodb_flush_log_at_trx_commit for a lifetime :-)

Depending on the database you're using, and on your configuration, they may NOT recover, or require manual intervention to recover. There is a reason that MSSQL has a VSS writer in Windows, and that PostgreSQL and MySQL have their own "dump programs" that do clean backups. Pulling the plug (= file system snapshotting) without involving the database/app is risky business.

Databases these days are really resilient, so I'm not saying that $yourfavoriteapp will never recover. But unless you involve the application or a VSS writer (which does that for you), you cannot be sure that it'll come back up.

[1] https://blog.heckel.io/2021/10/19/lossless-mysql-semi-sync-r...

binwiederhier•6mo ago

This strongly depends on your environment and on your RTO/RPO.

Sure, there are environments that have automatically deployed, largely stateless servers. Why back them up if you can recreate them in an hour or two ;-)

Even then, though, if we're talking about important production systems with an RTO of only a few minutes, then having a BCDR solution with instant virtualization is worth your weight in gold. I may be biased though, given that I professionally write BCDR software, hehe.

However, many environments are not like that: There are lots of stateful servers out there with bespoke configurations, lots of "the customer needed this to be that way and it doesn't fit our automation". Having all servers backed up the same way gives you peace of mind if you manage servers for a living. Being able to just spin up a virtual machine of a server and run things from a backup while you restore or repair the original system is truly magical.

mekster•6mo ago

3-2-1 analogy is old. We have infinite flexibility on where we can put data unlike before cloud servers existed.

I'd at least have file system snapshots locally for easy recovery in case of manual mistakes, have it copied at a remote location using implementation A and let it snapshot there too, copy same amount on another location using implementation B and let it snapshot there too, so not only you'd have durability, implementation bugs on a backup process can also be mitigated.

zfs is a godsend for this and I use Borg as secondary implementation, which seems enough for almost any disasters.

immibis•6mo ago

My personal external backup is two external drives in RAID1 (RAID0 wtfff?). One already failed, of course the Seagate one. It failed silently, too - a few sectors just do not respond to read commands and this was discovered when in-place encrypting the array. (I normally would avoid Seagate consumer drives if it wasn't for brand diversity. Now I have two WD drives purchased years apart.)

It's a home backup so not exactly relevant to most of what you said - just wanted to underscore the point about storage media sucking. Ideally I'd periodically scrub each drives independently (can probably be done by forcing a degraded array mode, but careful not to mess up the metadata!) against checksums made by backup software. This particular failure mode could also be caught by dd'ing to /dev/null.

binwiederhier•6mo ago

ZFS really shines here with its built-in "zpool scrub" command and checksumming.

Even though I am preaching "application consistent backups" in my original comment (because that's what's important for businesses), my home backup setup is quite simple and isn't even crash consistent :-) I do: Pull via rsync to backup box & ZFS snapshot, then rsync to Hetzner storage box (ZFS snapshotted there, weekly)

My ZFS pool consists of multiple mirrored vdevs, and I scrub the entire pool once a month. I've uncovered drive failures, and storage controller failures this way. At work, we also use ZFS and we've uncovered even failures of entire product lines of hard drives.

Shank•6mo ago

> Security: I avoid using mainstream cloud storage services like Dropbox or Google Drive for primary backups. Own your data!

What does this have to do with security? You shouldn't be backing up data in a way that's visible to the server. Use something like restic. Do not rely on the provider having good security.

inopinatus•6mo ago

Perhaps Part 1 ought to be headlined, “design the restore system”, this being the part of backup that actually matters.

kayson•6mo ago

The thing that always gets me about backup consistency is that it's impossibly difficult to ensure that application data is in a consistent state without bringing everything down. You can create a disk snapshot, but there's no guarantee that some service isn't mid-write or mid-procedure at the point of the snapshot. So if you were to restore the backup from the snapshot you would encounter some kind of corruption.

Database dumps help with this, to a large extent, especially if the application itself is making the dumps at an appropriate time. But often you have to make the dump outside the application, meaning you could hit it in the middle of a sequence of queries.

Curious if anyone has useful tips for dealing with this.

booi•6mo ago

I think generally speaking, databases are resilient to this so taking a snapshot of the disk at any point is sufficient as a backup. The only danger is if you're using some sort of on-controller disk cache with no battery backup, then basically you're lying to the database about what has flushed and there can be inconsistencies on "power failure" (i.e. live snapshot).

But for the most part as especially in the cloud, this shouldn't be an issue.

immibis•6mo ago

Beware that although databases are resilient to snapshotting, they're not resilient to inconsistent snapshots. All files have to be snapshotted at the exact same moment, which means either a filesystem-level or disk-level snapshot, or SIGSTOP all database processes before doing your recursive copy or rsync.

Some databases have the ability to stop writing and hold all changes in memory (or only append to WAL, which is recursive-copy-safe) while you tell it you're doing a backup.

Jedd•6mo ago

It's not clear if there are other places that application state is being stored, outside your database, that you need to capture. Do you mean things like caches? (I'd hope not.)

pg_dump / mysqldump both solve the problem of snapshotting your live database safely, but can introduce some bloat / overhead you may have to deal with somehow. All pretty well documented and understood though.

For larger postgresql databases I've sometimes adopted the other common pattern of a read-only replica dedicated for backups: you pause replication, run the dump against that backup instance (where you're less concerned about how long that takes, and what cruft it leaves behind that'll need subsequent vacuuming) and then bring replication back.

Jedd•6mo ago

Feels weird to talk about strategy for your backups without mentioning RPO, RTO, or even RCO - even though some of those concepts are nudged up against in TFA.

Those terms are handy for anyone not familiar with the space to go do some further googling.

Also odd to not note the distinction between backups and archives - at least in terms, of what users' expectations are around the two terms / features - or even mention archiving.

(How fast can I get back to the most recent fully-functional state, vs how can I recover a file I was working on last Tuesday but deleted last Wednesday.)

godelski•6mo ago

  > without mentioning RPO, RTO, or even RCO

  > Those terms are handy for anyone not familiar with the space to go do some further googling.

You should probably get people started

  RPO: Recovery Point Objective 
  RTO: Recovery Time Objective
  RCo: Recovery Consistency

I'm pretty sure they aren't mentioned because these aren't really necessary for doing self-hosted backups. Do we really care much about how fast we recover files? Probably not. At least not more than that they exist and we can restore them. For a business, yeah, recovery time is critical as that's dollars lost.

FWIW, I didn't know these terms until you mentioned them, so I'm not an expert. Please correct me if I'm misunderstanding or being foolishly naive (very likely considering the previous statement). But as I'm only in charge of personal backups, should I really care about this stuff? My priorities are that I have backups and that I can restore. A long running rsync is really not a big issue. At least not for me.

https://francois-encrenaz.net/what-is-cloud-backup-rto-rpo-r...

Jedd•6mo ago

Fair that I should have spelled them out, though my point was that TFA touched on some of the considerations that are covered by those fundamental and well known concepts / terms.

Knowing the jargon for a space makes it easier to find more topical information. Searching on those abbreviations would be sufficient, anyway.

TFA talks about the right questions to consider when planning backups (but not archives) - eg 'What downtime can I tolerate in case of data loss?' (that's your RTO, effectively).

I'd argue the concepts encapsulated in those TLAs - even if they sound a bit enterprisey - are important for planning your backups, with 'self-hosted' not being an exception per se, just having different numbers.

Sure, as you say 'Do we really care about how fast we recover files?' - perhaps you don't need things back in an hour, but you do have an opinion about how long that should take, don't you?

You also ask 'should I really care about this stuff?'

I can't answer that for you, other than turn it back to 'What losses are you happy to tolerate, and what costs / effort are you willing to incur to mitigate?'. (That'll give you a rough intersection of two lines on your graph.)

This pithy aphorism exists for a good reason : )

  > There are two types of people: those who have lost data,
  > and those who do backups.

gchamonlive•6mo ago

A good time as ever for a shameless plug.

For my archlinux setup, configuration and backup strategy: https://github.com/gchamon/archlinux-system-config

For the backup system, I've cooked an automation layer on top of borg: https://github.com/gchamon/borg-automated-backups

topspin•6mo ago

I built a disaster recovery system using python and borg. It snapshots 51 block devices on a SAN and then uses borg to backup 71 file systems from these snapshots. The entire data set is then synced to S3. And yes, I've tested the result in a offsite: recovering files systems to entirely different block storage and booting VMs, so I'm confident that it would work if necessary, although not terribly quickly, because the recovery automation is complex and incomplete.

I can't share it. But if you contemplate such a thing, it is possible, and the result is extremely low cost. Borg is pretty awesome.

firesteelrain•6mo ago

I run a system that has multi site replication to multiple Artifactory instances all replicating from one single Master to all Spokes. Each one can hold up to 2PB. While Artifactory supports writing to a backup location, given the size of our artifacts, we chose to not have an actual backup. Just live replication to five different sites. Never have tried to restore or replicate back to main. I am not even sure how that would work if the spokes are all “*-cache”.

Backend storage for each Artifactory instance is Dell Isilon.

tomheskinen•6mo ago

artifactory is great, and i do something very similar

KPGv2•6mo ago

It's a very interesting thought experiment. But, all of this and at the end of the day you still need to have a computer running in a different city 24/7 for a safe backup (floods and tornados will mess up your buddy's house five miles away, too). This is why, in the end, I settled for paying for a rolling backup service.

udev4096•6mo ago

PBS for proxmox and restic for anything outside is the best combo. Super easy to configure and manage

senectus1•6mo ago

Whats the cheapest place to store offsite backups these days?

I intend to fully encrypt before sending so it should be safe from prying eyes from all but the most cashed up nation states :-P

vaylian•6mo ago

> "Schrödinger's backups" (i.e., never tested, thus both valid and invalid at the same time)

What are some good testing strategies?

k1t•6mo ago

Occasionally you need to try restoring from your backups.

Obviously a full restore gives you full confidence, and it goes down from there.

Ideally try to restore about 10% of your content every month,but really it depends on how high stakes your backups are.

vaylian•6mo ago

Where do you restore to? Do you restore to a spare computer or do you restore into some isolated part (folder) of the production system from which the backup was originally taken?

And to which extent can this be automated, so that the backup gets automatic health checks?

orhmeh09•6mo ago

Just use restic. It handles these things.

kbr2000•6mo ago

Dirvish [0] is worth looking at, light-weight and providing a good set of functionality (rotation, incremental backups, retention, pre/post scripts). It is a scripted wrapper around rsync [1] so you profit from all that functionality too (remote backups, compression for limited links, metadata/xattr support, various sync criteria, etc.)

This has been a lifesaver for 20+ years, thanks to JW Schultz!

The questions/topics in the article go really well along with it.

[0] https://dirvish.org/ [1] https://rsync.samba.org/

zelphirkalt•6mo ago

What does dirvish do better or simpler than rsync?

kbr2000•6mo ago

It permits you to config more complicated backups more easily. You can inherit and override rules, which is handy if you need to do for example hundreds of similar style backups, with little exceptions. The same with include/exclude patterns, quickly gets complicated with just rsync.

It generates indices for its backups that allow you to search for files over all snapshots taken (which gives you an overview of which snapshots contain some file for you to retrieve/inspect). See dirvish-locate.

Does expiration of snapshots, given your retention strategy (encoded in rules, see dirvish.conf and dirvish-expire).

It consistently creates long rsync commandlines you would otherwise need to do by hand.

In the end you get one directory per snapshot, giving a complete view over what got backed up. Unchanged files are hard-linked thus limiting backup storage consumption. Changed files are stored. But each snapshot has the whole backed up structure in it so you could rsync it back at restore time (or pick selectively individual files if needed). Hence the "virtual".

Furthermore: backup reporting (summary files) which you could be piped into an E-mail or turned into a webpage, good and simple documentation, pre/post scripts (this turns out to be really useful to do DB dumps before taking a backup etc.)

You'll still need to take care of all other aspects of designing your backup storage (SAS controllers/backplanes/cabling, disks, RAID, LVM2, XFS, ...) and networking (10 GbE, switching, routing if needed, ...) if you need that (works too for only local though). Used this successfully in animation film development as an example, where it backed up hundreds of machines and centralized storage for a renderfarm, about 2 PBytes worth (with Coraid and SuperMicro hardware). Rsync traversing the filesystem to find out changes could be challenging at times with enormous FS (even based on only the metadata), but for that we created other backup jobs that where fed with specific file-lists generated by the renderfarm processes, thus skipping the search for changes...

crinkly•6mo ago

Lazy solution here that has worked fine forever through a complete hardware failure and burglary. Scratch disk inside desktop. External disk kept in house. External disk kept off site. All external disks are Samsung T7 Shield.

Robocopy /MIR daily to scratch or after I’ve done something significant. Weekly to external disk. Swap external disk offsite every 1 month.

chrisandchris•6mo ago

> All external disks are Samsung T7 Shield

And make sure to use at least a different batch, or better a different model. Same batch or same model tend to fail at the same time (usually if you need to restore data and the disk is under heavy load).

crinkly•6mo ago

Not a terrible idea that. Thank you. I will check dates, firmware versions and serial numbers to see.

HankB99•6mo ago

Coincidentally the 2.5 admins podcast just published an episode on ZFS basics: Why ZFS https://2.5admins.com/2-5-admins-256/

ZFS relates to backups. In my case (among the many things I like about ZFS) is that it preserves hard links which I used to reduce the space requirements for my primary `rsync` backup but which `rsync` blew up copying to my remote backup. (Yes, there's a switch to preserve hard links but it is not sufficiently performant for this application.)

(Episode #256 which is a number that resonates with many of us. ;) )

DoNotNotify is now Open Source

Why E cores make Apple Silicon fast

Dave Farber has passed away

Matchlock: Linux-based sandboxing for AI agents

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

SectorC: A C Compiler in 512 bytes (2023)

(AI) Slop Terrifies Me

Rabbit Ear "Origami": programmable origami in the browser (JS)

LLMs as the new high level language

The Legacy of Daniel Kahneman: A Personal View (2025)

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Software factories and the agentic moment

Modern and Antique Technologies Reveal a Dynamic Cosmos

Speed up responses with fast mode

A11yJSON: A standard to describe the accessibility of the physical world

Hoot: Scheme on WebAssembly

uLauncher

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

LineageOS 23.2

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

In the Australian outback, we're listening for nuclear tests

Start all of your commands with a comma (2009)

Arcan Explained – A browser for different webs

DoNotNotify is now Open Source

Why E cores make Apple Silicon fast

Dave Farber has passed away

Matchlock: Linux-based sandboxing for AI agents

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

SectorC: A C Compiler in 512 bytes (2023)

(AI) Slop Terrifies Me

Rabbit Ear "Origami": programmable origami in the browser (JS)

LLMs as the new high level language

The Legacy of Daniel Kahneman: A Personal View (2025)

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Software factories and the agentic moment

Modern and Antique Technologies Reveal a Dynamic Cosmos

Speed up responses with fast mode

A11yJSON: A standard to describe the accessibility of the physical world

Hoot: Scheme on WebAssembly

uLauncher

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

LineageOS 23.2

First Proof

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

In the Australian outback, we're listening for nuclear tests

Start all of your commands with a comma (2009)

Arcan Explained – A browser for different webs

Make Your Own Backup System – Part 1: Strategy Before Scripts

Comments