I Dropped the Production Database on a Friday Night

https://vince.beehiiv.com/p/how-i-dropped-the-production-database-on-a-friday-night

34•vincejos•7mo ago

Comments

cranberryturkey•7mo ago

i dropped the dev database once at PayPal back in 2006

grepfru_it•7mo ago

I once remailed emails to IEEE and ACM. I was ready to quit and take the L for such a bad mistake. Not write a blog post for Friday evening consumption

cranberryturkey•7mo ago

Haha yeah I wasn’t gonna blog about my f up

Arnt•7mo ago

I hope the poster will learn about transactions at some point. Postgres even lets you alter the schema within a transaction.

What I learned, once upon a time, is that with a database, you shouldn't delete data you want to keep. If you want to keep something, you use SQL's fine UPDATE to update it, you don't delete it. Databases work best if you tell them to do what you want them to do, as a single transaction.

XorNot•7mo ago

I mean

UPDATE users SET name='test'

is still effectively a delete...

Arnt•7mo ago

Only as a matter of low level storage. It won't trigger ON DELETE CASCADE and that kind of thing.

This is a kind of misunderstanding I've heard from others who were first exposed to hacky things like early mysql. Databases are something else. A different kind of beast. If you use a database, and Postgres is the best of the DBMSes, then you can say things like "a lead shouldn't be deleted before three months have passed, no matter what" or "a lead can't be deleted until its state column says it's been handled" and the DBMS will make sure of it. If you have a bug that would involve leads being deleted prematurely, the DBMS will reject your change. Your change just won't break the database.

vincejos•7mo ago

I use transactions all the time for my other projects and I've read the great Designing Data Intensive Applications which cover the topic of linearization in depth.

Insanity•7mo ago

Assuming storage cost is not a huge concern, I’m a big fan of soft deletes everywhere. Also leaves an easy “audit trail” to see who tried to delete something.

Of course - there are exceptions (gdpr deletion rules etc)

bloudermilk•7mo ago

I dropped the production database at the first startup I worked at, three days after we went live. We were scrappy™ and didn’t have backups yet, so we lost all the data permanently. I learned that day that running automated tests on a production database isn’t a good idea!

noman-land•7mo ago

I got deep pangs of pain and anguish for you and everyone involved. These lessons hurt so much to learn the hard way.

ghushn3•7mo ago

> I learned that day that running automated tests on a production database isn’t a good idea!

There's novel lessons to be learned in tech all the time.

This is not one of them.

doubled112•7mo ago

Learn lessons from other people. You can't learn them all yourself.

ghushn3•7mo ago

Yes, this is what I'm saying.

wombatpm•7mo ago

Here is another one: Don't trust ops when they say they have backups. I asked and was told there are weekly full backups, with daily incrementals. The time came when I needed a production DB restored due to an upgrade bug in our application. That was bad - thank $DEIITY we have backups.

OPS: Huh, it appears we can't find your incremental.

ME: Well just restore the weekly, its only Tuesday.

Two Days later.

OPS:About that backup. Turns out it's a backup of the servers, not the database. We'll have to restore to new VM's in order to get at the data.

ME: How did this happen?

OPS: Well the backups work for MSSQL Server.

ME: This is PostgreSQl.

OPS: Yeah, apparently we started setting that up but never finished.

ME: You realize we have about 20 applications using that database?

OPS: Now we do.

Lesson: Until you personally have seen a successful restore from backup, you do not have backups. You have hopes and prayers that you have backups. I am forever in the Trust but Verify camp.

citizenpaul•7mo ago

If your company is big enough to have dedicated ops then it should be running regular tests on backups. A disaster recovery process if you will.

At some point though its not your problem when the company is big enough. Are you gonna do everyone's job? You tell em what you need in writing and if they drop the ball its their head.

scott_w•7mo ago

It’s relative. No, I’m not sitting on the shoulder of the team that manages that (nor should I, there’d be 40 EMs bothering them!) but I fully expect my CTO has done it. And if not? Well, one day it’ll blow up and I’m looking for another job but that’s no different to any other possible major issues.

wombatpm•7mo ago

The majority of our apps were Java, running Tomcat on windows server using MSSQL or oracle. That was tested as part of DR. Our Linux servers running Python and Postgres were not as high a priority apparently.

The lack of working backups made it a problem because if assurances and certifications we were required to maintain.

When starting a new project I now request a dev database with a dump from prod more than 30 days ago just to see the process work. Does it waste their time? Maybe. In which case it encourages more automation. Do I care no? But I am not getting burned again.

physix•7mo ago

I dunno. The effort needed to ensure you have backups is tiny compared to the work done to create the product. And to pull a backup before deleting stuff in production only needs a smidgen of experience.

They were extremely lucky. Imagine what the boss would have said if they hadn't managed to recover the data.

endorphine•7mo ago

This _was_ one of the bosses.

physix•7mo ago

Ah, yes.

> I immediately messaged my co-founders.

peterldowns•7mo ago

The “and honestly?” phrase smells like AI writing to the point I stopped there and closed the post.

Don’t fuck your database up and do have point-in-time rollbacks. No excuses it’s not hard. Not something to be proud of.

pton_xd•7mo ago

Yeah, the whole thing is full of AI-isms. Started skimming and every other sentence has one.

"Picture this: Panic mode activated. You heard that right. But here's what surprised me the most" and so on. Ugh.

vincejos•7mo ago

I tried to have a conversational, story-telling style, maybe that's why you think there are lots of "AI-isms". But I take this as a feedback for the next editions: less fluff, more straight-to-the point writing. Thanks!

mulmen•7mo ago

Let he who is without sin cast the first DELETE CASCADE.

munchler•7mo ago

Developing directly on the production database with no known backups. Saved from total disaster by pure luck. Then a bunch of happy talk about it being a "small price to pay for the lessons we gained" and how such failures "unleash true creativity". It's amazing what people will self-disclose on the internet.

jacobsenscott•7mo ago

Yeah. Imagine everything else that's completely wrong in that app.

000ooo000•7mo ago

I cut my dev teeth in a financial institution so I'll concede I'm biased away from risk, but devving directly on the prod DB, not having a local enviroment to test changes against, and worse: literally no backups.. it screams wreckless, stupid, cheap, arrogant, and immature (in the tech sense). Nothing I'd like my name against publicly.

physix•7mo ago

A colleague upgraded the production database for a securities financing settlement system on a Friday evening by accident 20 years ago.

We were devs with root access to production and no network segregation. He wanted to upgrade his dev environment, but chose the wrong resource file.

He was lucky it was a Friday, because it took us the whole weekend working round the clock to get the system and the data to a consistent state by start of trading.

We called him The Dark Destroyer thereafter.

So I would add network segregation to the mix of good ideas for production ops.

grepfru_it•7mo ago

I’m building my toy project and I have an LTO drive taking backups every night. Here I am complaining that having 2Tb of backups is too much.

lol good luck op

orochimaaru•7mo ago

That's the first thing I took away. The author ignores every sane software engineering practice, is saved by pure luck and then dives into what commands not to use in supabase. Why do this? Why not spend a week or two before you launch to setup a decent ci/cd pipeline? That's the real lesson here.

lovehashbrowns•7mo ago

Right?! This whole post is kinda absurd. It has the feel of a kid putting a fork into an outlet, getting the shock of a lifetime and then going “and thanks to this, everyone in my household now knows not to put a fork into an outlet.” You didn’t have to go through all this to figure out that you need backups. The fluff is the cherry on top

karmakurtisaani•7mo ago

Maybe the post is an attempt to save face in front of his colleagues. Owning up to the mistake and listing lessons learned.

vincejos•7mo ago

While I agree with everything said here about making backups etc. and which I have done in my career at later stage companies, when you are just starting out and building MVPs, I'd argue (as I do in the newsletter) that losing 2 weeks to setup CI/CDs pipelines and backups before you can pay the rent is a waste of time! I was a Supabase noob back then so I had not explored their features for local development, which is the learning I try to share in this post.

max0563•7mo ago

Uhh, no, the answer is not to avoid cascading deletes. The answer is to not develop directly on a production database and to have even the most basic of backup strategies in place. It is not hard.

Also, “on delete restrict” isn’t a bad policy either for some keys. Make deleting data difficult.

throwdbaaway•7mo ago

> Here's the technical takeaway: Never use CASCADE deletes on critical foreign keys. Set them to NULL or use soft deletes instead. It's fine for UPDATE operations, but it's too dangerous for DELETE ones. The convenience of automatic cleanup isn't worth the existential risk of chain reactions.

I actually agreed 100% with this learning, especially the last sentence. The younger me would write a long email to push for ON DELETE CASCADE everywhere. The older me doesn't even want to touch terraform, where an innocent looking update can end up destroying everything. I will rather live with some orphaned records and some infra drifts.

And still I got burnt few months ago, when I inadvertently triggered some internal ON DELETE CASCADE logic of Consul ACL.

(I do agree with your other points)

oulu2006•7mo ago

This is such a poorly written post, and im sure there are on-going disasters waiting to happen -- I've built 3 startups and sold 2 of them and never ever developed on production. ?? What level of crazy is this?

codesnik•7mo ago

supabase kiinda pushes you in that direction though.

vincejos•7mo ago

I agree. They also push you not to git migrations at first, which is definitely not the best practice.

kiwicopple•7mo ago

(Supabase ceo)

We do not push devs not to do migrations - we would strongly prefer if everyone used migrations and declarative schemas.

Especially at the scale that OP is at (see maturity model: https://supabase.com/docs/guides/deployment/maturity-model)

vincejos•7mo ago

While I don’t question the maturity model in itself (which I read after the incident and that’s why I started gitting migrations just after), I realized it was harder than other Supabase features for it to work well, especially when you start working with other features than just authentication and Postgres.

In particular, webhooks and triggers don’t work out of the box. So maybe it’s not pushing in a particular direction but at least I’d argue it’s not nudging you to do it because it entails some hours of custom setup and debugging before the CLI commands like supabase db diff actually work as intended in my experience. But I know the Supabase team is improving it every release so I’m thankful for this work!

heroprotagonist•7mo ago

I'm sorry, but there's "move fast and break things" and then there's a group of junior devs not even bothering to google a checklist of development or moving to production best practices.

Your Joe AI customers should be worried. Anyone actually using the RankBid you did a Show HackerNews on 8 months ago should be worried (particularly by the "Secure by design: We partner with Stripe to ensure your data is secure." line.

If you don't want to get toasted by some future failure where you won't be accidentally saved by a vendor, then maybe start learning more on the technical side instead of researching and writing blogspam like "I Read 10 Business Books So You Don't Have To".

This might sound harsh, but it's intended as sound advice that clearly nobody else is giving you.

vincejos•7mo ago

Thanks for the feedback, I really appreciate it. Rankbid and other projects I've made, I built from scratch myself. They have strong, solid, technical foundations. Try them for yourself, even try to hack them if you want if it proves my point.

This was not the case of Joe AI. I joined later in the project, and the foundations where even weaker than what is shown in this newsletter (no API endpoint authentication whatsoever, open bar, for example) and so I had to secure and migrate everything myself when I joined them. This was what the Supabase migration was trying to accomplish. Before I joined, they didn't even have a database but I won't get into the details here.

Before Rankbid, and the other products I've built, I've worked at a B2C startup with millions of users and never caused a big outage there, I've been programming for more than ten years, and I have a double degree in computer science, and while I agree with what "should be done" in theory for production level apps, sometimes, you need to move very fast to build great startups. I've read many technical books in my life such as Designing Data Intensive Applications, High Performance Browser Networking. I know the theory, but sometimes you just don't have the time to do everything perfectly. That's what I try to expose in this blog post. I also wanted to share a humbling experience. Everyone makes mistakes, and I'm not ashamed of making some, even after years of software engineering.

My newsletter is about the intersection of programming and business. You might not find the "business" part interesting which is fine, but I think what you call blogspam has real value for engineers who have never sold before in their life and want to learn the ropes. I spend a lot of time writing each edition, because I try to respect the time of my readers as much as possible to deliver some actual insights (even if there is a bit of fluff or story telling sometimes).

And for Joe AI: it has since become much more secure, and is progressively implementing engineering best practices, so customers don't have to worry.

geuis•7mo ago

Owww. The first or second paragraph of this made me cringe

"I had just finished what I thought was a clean migration: moving our entire database from our old setup to PostgreSQL with Supabase" ... on a Friday.

Never do prod deploys on a Friday unless you have at least 2 people available through the weekend to resolve issues.

The rest of this post isn't much better.

And come one. Don't do major changes to a prod db when critical team members have signed off for a weekend or holiday.

I'm actually quite happy OP posted their experiences. But it really needs to be a learning experience. We've all done something like this and I bet a lot of us old timers have posted similar stories.

grepfru_it•7mo ago

No. Never release/upgrade on a Friday. Had too many late night weekends when I should be happily drinking beer. Never release at eod Friday. Never.

vincejos•7mo ago

It's hard to have 2 people available when you have a 2 people tech team. We were very early back then, MVP stage.

Arainach•7mo ago

>Here's the technical takeaway: Never use CASCADE deletes on critical foreign keys.

The technical takeaway, as others have said, is to do prod deployment during business hours when there are people around to monitor and to help recover if anything goes wrong, and where it will be working hours for quite a while in the future. Fridays are not that.

danesparza•7mo ago

Also: don't brag about doing the opposite of what this guy says.

vincejos•7mo ago

When you are a 3 people startup, I'd argue there is no such thing as "business hours". I worked every day back then. I'll concede that the "Friday Night" part in the title might be a bit clickbait to that regard.

heyarviind2•7mo ago

Your website title is "Profitable Programming" with a blog post "How I Dropped the Production Database on a Friday Night"

Thats not very profitable

RVuRnvbM2e•7mo ago

This is a good story and something everyone should experience in their career even just for the lesson in humility. That said:

What? The point of cascading foreign keys is referential integrity. If you just leave dangling references everywhere your data will either be horribly dirty or require inconsistent manual cleanup.

As I'm sure others have said: just use a test/staging environment. It isn't hard to set up even if you are in startup mode.

JonoBB•7mo ago

> The point of cascading foreign keys is referential integrity.

Not quite. Databases can enforce referential integrity through foreign keys, without cascading deletes being enabled.

“On delete restrict” vs “on delete cascade” still enforces referential integrity, and is typically a better way to avoid the OP’s issue.

vincejos•7mo ago

Thanks for your takeaway. Yes the dev environment is definitely a must as soon as you start growing!

booleandilemma•7mo ago

Who is this guy? He seems like a poser. I wouldn't be surprised if these articles are AI-generated.

grepfru_it•7mo ago

To be fair this was the norm 10 years ago. Just seems like he is stuck in the past. Really no excuse to provision an ec2 volume and dump all backups there. I’m not even in prod yet and have full backups to LTO to be ready for launch next month

danesparza•7mo ago

This was never the norm for successful companies. This is only the norm for cowboys who have more pizza than good sense.

vincejos•7mo ago

Harsh but untrue (for the AI-generated part).

lawgimenez•7mo ago

Did I read that correctly? They’re on Supabase’ free plan in production?

We’re just getting started and we’re even in Supabase’ paid plan.

vincejos•7mo ago

Why do you take the paid plan when getting started?

scott_w•7mo ago

Once you’re at a point where some of your business depends on it, you probably want the things like backups they provide…

vincejos•7mo ago

Definitely! I had just finished the migration back then so that's why we were still on the free plan, but we had planned on enabling even PITR

catapps•7mo ago

Echoing the other comments about just how bad the setup here is. Setting up staging/dev environments does not take so much time as to put you behind your competition. There's a vast, VAST chasm between "We're testing on the prod DB with no backups" and the dreaded guardrails and checkboxes.

That being said, I would love to see more resources about incident management for small teams and how to strike this balance. I'm the only developer working on a (small, but somehow super political/knives-out) company's big platform with large (F500) clients and a mandate-from-heaven to rapidly add features -- and it's by far the most stressed out I've ever been in my career if not life. Every incident, whether it be the big GCP outage from last week or a database crash this week, leads to a huge mental burden that I have no idea how to relieve, and a huge passive-aggressive political shitstorm I have no idea how to navigate.

b0a04gl•7mo ago

this is exactly how you earn your prod stripes. dropped the db on day 3? good. now you’re officially a backend engineer.

no backups? perfect. now you'll never forget to set one up again. friday night? even better. you got the full rite of passage.

people act like this's rare. it’s not. half of us have nuked prod, the other half are lying or haven't been given prod access yet.

you’re fine. just make the checklist longer next time. and maybe alias `drop` to `echo "no"` for a while

scott_w•7mo ago

Dropping DB on day 3 of your business? Probably fine. Dropping it on your day 3 but on day 300 of your business when you have paying customers? Seriously?

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

I Dropped the Production Database on a Friday Night

Comments