What I learned, once upon a time, is that with a database, you shouldn't delete data you want to keep. If you want to keep something, you use SQL's fine UPDATE to update it, you don't delete it. Databases work best if you tell them to do what you want them to do, as a single transaction.
UPDATE users SET name='test'
is still effectively a delete...
Of course - there are exceptions (gdpr deletion rules etc)
There's novel lessons to be learned in tech all the time.
This is not one of them.
OPS: Huh, it appears we can't find your incremental.
ME: Well just restore the weekly, its only Tuesday.
Two Days later.
OPS:About that backup. Turns out it's a backup of the servers, not the database. We'll have to restore to new VM's in order to get at the data.
ME: How did this happen?
OPS: Well the backups work for MSSQL Server.
ME: This is PostgreSQl.
OPS: Yeah, apparently we started setting that up but never finished.
ME: You realize we have about 20 applications using that database?
OPS: Now we do.
Lesson: Until you personally have seen a successful restore from backup, you do not have backups. You have hopes and prayers that you have backups. I am forever in the Trust but Verify camp.
At some point though its not your problem when the company is big enough. Are you gonna do everyone's job? You tell em what you need in writing and if they drop the ball its their head.
They were extremely lucky. Imagine what the boss would have said if they hadn't managed to recover the data.
> I immediately messaged my co-founders.
Don’t fuck your database up and do have point-in-time rollbacks. No excuses it’s not hard. Not something to be proud of.
"Picture this: Panic mode activated. You heard that right. But here's what surprised me the most" and so on. Ugh.
We were devs with root access to production and no network segregation. He wanted to upgrade his dev environment, but chose the wrong resource file.
He was lucky it was a Friday, because it took us the whole weekend working round the clock to get the system and the data to a consistent state by start of trading.
We called him The Dark Destroyer thereafter.
So I would add network segregation to the mix of good ideas for production ops.
lol good luck op
Also, “on delete restrict” isn’t a bad policy either for some keys. Make deleting data difficult.
I actually agreed 100% with this learning, especially the last sentence. The younger me would write a long email to push for ON DELETE CASCADE everywhere. The older me doesn't even want to touch terraform, where an innocent looking update can end up destroying everything. I will rather live with some orphaned records and some infra drifts.
And still I got burnt few months ago, when I inadvertently triggered some internal ON DELETE CASCADE logic of Consul ACL.
(I do agree with your other points)
Your Joe AI customers should be worried. Anyone actually using the RankBid you did a Show HackerNews on 8 months ago should be worried (particularly by the "Secure by design: We partner with Stripe to ensure your data is secure." line.
If you don't want to get toasted by some future failure where you won't be accidentally saved by a vendor, then maybe start learning more on the technical side instead of researching and writing blogspam like "I Read 10 Business Books So You Don't Have To".
This might sound harsh, but it's intended as sound advice that clearly nobody else is giving you.
"I had just finished what I thought was a clean migration: moving our entire database from our old setup to PostgreSQL with Supabase" ... on a Friday.
Never do prod deploys on a Friday unless you have at least 2 people available through the weekend to resolve issues.
The rest of this post isn't much better.
And come one. Don't do major changes to a prod db when critical team members have signed off for a weekend or holiday.
I'm actually quite happy OP posted their experiences. But it really needs to be a learning experience. We've all done something like this and I bet a lot of us old timers have posted similar stories.
The technical takeaway, as others have said, is to do prod deployment during business hours when there are people around to monitor and to help recover if anything goes wrong, and where it will be working hours for quite a while in the future. Fridays are not that.
Thats not very profitable
> Here's the technical takeaway: Never use CASCADE deletes on critical foreign keys. Set them to NULL or use soft deletes instead. It's fine for UPDATE operations, but it's too dangerous for DELETE ones. The convenience of automatic cleanup isn't worth the existential risk of chain reactions.
What? The point of cascading foreign keys is referential integrity. If you just leave dangling references everywhere your data will either be horribly dirty or require inconsistent manual cleanup.
As I'm sure others have said: just use a test/staging environment. It isn't hard to set up even if you are in startup mode.
Not quite. Databases can enforce referential integrity through foreign keys, without cascading deletes being enabled.
“On delete restrict” vs “on delete cascade” still enforces referential integrity, and is typically a better way to avoid the OP’s issue.
We’re just getting started and we’re even in Supabase’ paid plan.
That being said, I would love to see more resources about incident management for small teams and how to strike this balance. I'm the only developer working on a (small, but somehow super political/knives-out) company's big platform with large (F500) clients and a mandate-from-heaven to rapidly add features -- and it's by far the most stressed out I've ever been in my career if not life. Every incident, whether it be the big GCP outage from last week or a database crash this week, leads to a huge mental burden that I have no idea how to relieve, and a huge passive-aggressive political shitstorm I have no idea how to navigate.
no backups? perfect. now you'll never forget to set one up again. friday night? even better. you got the full rite of passage.
people act like this's rare. it’s not. half of us have nuked prod, the other half are lying or haven't been given prod access yet.
you’re fine. just make the checklist longer next time. and maybe alias `drop` to `echo "no"` for a while
cranberryturkey•3d ago
grepfru_it•2h ago