We thought we had something backed up - but was not really the case.
We have multiple databases and apps - each having its own data store often.
How do you usually deal with server backups? What has worked for you and what has not?
We thought we had something backed up - but was not really the case.
We have multiple databases and apps - each having its own data store often.
How do you usually deal with server backups? What has worked for you and what has not?
1. Backups must be taken offsite on a separate server (obvious but surprisingly some people miss this)
2. Backups must be tested frequently. If you cannot test a backup, you don't have a backup.
3. Frequency depends on your criticality of data, your contract/SLA with your customer etc. Ideally, you should be able to have Point-in-time-Restore (PTR) going back to certain number of hours/days/weeks
4. Make sure to have notifications for backup failures. If a backup failed, you must be notified to correct it manually.
5. Bonus: Have a backup reconciliation script that runs additionally to recon all backups for a certain period.
Bender•4h ago
Personally I also like to have a local snapshot using rsnapshot of live/ephemeral data so that I can quickly get a node back in service assuming the backup volume only accessible by root has not been tainted or tampered with. OSSEC is one of the many tools that can checksum data and alert on tampering. AuditD with well written rules is also useful for real time monitoring. Anti-tampering is an entire topic by itself.
I like to keep these concepts outside of configuration management tools but design them so they can be easily pulled into said tools. This makes replacing a tool much easier. So if for example ones company desires switching from Chef to Ansible for whatever reasons the process is already a well known-known allowing a quick semi-automated migration.