frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The challenges of soft delete

https://atlas9.dev/blog/soft-delete.html
57•buchanae•2h ago

Comments

cj•1h ago
We deal with soft delete in a Mongo app with hundreds of millions of records by simply moving the objects to a separate collection (table) separate from the “not deleted” data.

This works well especially in cases where you don’t want to waste CPU/memory scanning soft deleted records every time you do a lookup.

And avoids situations where app/backend logic forgets to apply the “deleted: false” filter.

vjvjvjvjghv•1h ago
I guess that works well with NoSQL. In a relational database it gets harder to move record out if they have relationships with other tables.
tempest_•1h ago
Eh you could implement this pretty simply with postgres table partitions
buchanae•1h ago
Ah, that's an interesting idea! I had never considered using partitions. I might write a followup post with these new ideas.
tempest_•1h ago
There are a bunch of caveats around primary keys and uniqueness but I suspect it could be made to work depending on your data model.
nemothekid•1h ago
The trigger architecture is actually quite interesting, especially because cleanup is relatively cheap. As far as compliance goes, it's also simply to declare that "after 45 days, deletions are permanent" as a catch all, and then you get to keep restores. For example, I think (IANAL), the CCPA gives you a 45 day buffer for right to erasure requests.

Now instead of chasing down different systems and backups, you can simply set ensure your archival process runs regularly and you should be good.

whalesalad•1h ago
A good solution here (can be) to utilize a view. The underlying table has soft-delete field and the view will hide rows that have been soft deleted. Then the application doesn't need to worry about this concern all over the place.
elyobo•1h ago
postgres with rls to hide soft deleted records means that most of the app code doesn't need to know or care about them, still issues reads, writes, deletes to the same source table and as far as the app knows its working
maxchehab•1h ago
How do you handle schema drift?

The data archive serialized the schema of the deleted object representative the schema in that point in time.

But fast-forward some schema changes, now your system has to migrate the archived objects to the current schema?

buchanae•1h ago
In my experience, archived objects are almost never accessed, and if they are, it's within a few hours or days of deletion, which leaves a fairly small chance that schema changes will have a significant impact on restoring any archived object. If you pair that with "best-effort" tooling that restores objects by calling standard "create" APIs, perhaps it's fairly safe to _not_ deal with schema changes.

Of course, as always, it depends on the system and how the archive is used. That's just my experience. I can imagine that if there are more tools or features built around the archive, the situation might be different.

I think maintaining schema changes and migrations on archived objects can be tricky in its own ways, even kept in the live tables with an 'archived_at' column, especially when objects span multiple tables with relationships. I've worked on migrations where really old archived objects just didn't make sense anymore in the new data model, and figuring out a safe migration became a difficult, error-prone project.

talesmm14•1h ago
I've worked at companies where soft delete was implemented everywhere, even in irrelevant internal systems... I think it's a cultural thing! I still remember a college professor scolding me on an extension project because I hadn't implemented soft delete... in his words, "In the business world, data is never deleted!!"
mrkeen•19m ago
No comment from the professor on modifications though?
MaxGabriel•1h ago
This might stem from the domain I work in (banking), but I have the opposite take. Soft delete pros to me:

* It's obvious from the schema: If there's a `deleted_at` column, I know how to query the table correctly (vs thinking rows aren't DELETEd, or knowing where to look in another table)

* One way to do things: Analytics queries, admin pages, it all can look at the same set of data, vs having separate handling for historical data.

* DELETEs are likely fairly rare by volume for many use cases

* I haven't found soft-deleted rows to be a big performance issue. Intuitively this should be true, since queries should be O log(N)

* Undoing is really easy, because all the relationships stay in place, vs data already being moved elsewhere (In practice, I haven't found much need for this kind of undo).

In most cases, I've really enjoyed going even further and making rows fully immutable, using a new row to handle updates. This makes it really easy to reference historical data.

If I was doing the logging approach described in the article, I'd use database triggers that keep a copy of every INSERT/UPDATE/DELETEd row in a duplicate table. This way it all stays in the same database—easy to query and replicate elsewhere.

nine_k•51m ago
> DELETEs are likely fairly rare by volume for many use cases

All your other points make sense, given this assumption.

I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.

> Undoing is really easy

Depends on whether undoing even happens, and whether the act of deletion and undeletion require audit records anyway.

In short, there are cases when soft-deletion works well, and is a good approach. In other cases it does not, and is not. Analysis is needed before adopting it.

nerdponx•53m ago
One thing that often gets forgotten in the discussions about whether to soft delete and how to do it is: what about analysis of your data? Even if you don't have a data science team, or even a dedicated business analyst, there's a good chance that somebody at some point will want to analyze something in the data. And there's a good chance that the analysis will either be explicitly "intertemporal" in that it looks at and compares data from various points in time, or implicitly in that the data spans a long time range and you need to know the states of various entities "as of" a particular time in history. If you didn't keep snapshots and you don't have soft edits/deletes you're kinda SoL. Don't forget the data people down the line... which might include you, trying to make a product decision or diagnose a slippery production bug.
theLiminator•52m ago
Privacy regulations make soft delete unviable in many of the cases where it's useful.
sedatk•51m ago
The opposite is true in countries where there are data retention laws. Soft-delete is mandatory in those cases.
wavemode•24m ago
Soft deletion and privacy deletion serve different purposes.

If you leave a comment on a forum, and then delete it, it may be marked as soft-deleted so that it doesn't appear publicly in the thread anymore, but admins can still read what you wrote for moderation/auditing purposes.

On the other hand, if you send a privacy deletion request to the forum, they would be required to actually fully delete or anonymize your data, so even admins can no longer tie comments that you wrote back to you.

Most social media sites probably have to implement both of these processes/systems.

rorylaitila•51m ago
Databases store facts. Creating a record = new fact. "Deleting" a record = new fact. But destroying rows from tables = disappeared fact. That is not great for most cases. In rare cases the volume of records may be a technical hurdle; in which case, move facts to another database. The times I've wanted to destroy large volume of facts is approximately zero.
ntonozzi•49m ago
I've given up on soft delete -- the nail in the coffin for me was my customers' legal requirements that data is fully deleted, not archived. It never worked that well anyways. I never had a successful restore from a large set of soft-deleted rows.
zahlman•45m ago
> customers' legal requirements that data is fully deleted

Strange. I've only ever heard of legal requirements preventing deletion of things you'd expect could be fully deleted (in case they're needed as evidence at trial or something).

ntonozzi•39m ago
Many privacy regulations enforce full deletion of data, including GDPR: https://gdpr-info.eu/.
jandrewrogers•18m ago
While not common, regulations requiring a hard delete do exist in some fields even in the US. The ones I familiar with are effectively "anti-retention" laws that mandate data must be removed from the system after some specified period of time e.g. all data in the system is deleted no more than 90 days after insertion. This allows compliance to be automated.

The data subject to the regulation had a high potential for abuse. Automated anti-retention limits the risk and potential damage.

jamilbk•34m ago
At Firezone we started with soft-deletes thinking it might be useful for an audit / compliance log and quickly ran into each of the problems described in this article. The real issue for us was migrations - having to maintain structure of deleted data alongside live data just didn't make sense, and undermined the point of an immutable audit trail.

We've switched to CDC using Postgres which emits into another (non-replicated) write-optimized table. The replication connection maintains a 'subject' variable to provide audit context for each INSERT/UPDATE/DELETE. So far, CDC has worked very well for us in this manner (Elixir / Postgrex).

I do think soft-deletes have their place in this world, maybe for user-facing "restore deleted" features. I don't think compliance or audit trails are the right place for them however.

pjs_•29m ago
Tried implementing this crap once. Never again
tracker1•29m ago
I like having archive/history tables. I often do similar with job queues when persisting to a database, in this way the pending table can stay small and avoid full scans to skip the need for deleted records...

Aside, another idea that I've kicked forward for event driven databases is to just use a database like sqlite and copy/wipe the whole thing as necessary after an event or the work that's related to that database. For example, all validation/chain of custody info for ballot signatures... there's not much point in having it all online or active, or even mixed in with other ballot initiatives and the schema can change with the app as needed for new events. Just copy that file, and you have that archive. Compress the file even and just have it hard archived and backed up if needed.

cyberax•17m ago
Soft deletes + GC for the win!

We have an offline-first infrastructure that replicates the state to possibly offline clients. Hard deletes were causing a lot of fun issues with conflicts, where a client could "resurrect" a deleted object. Or deletion might succeed locally but fail later because somebody added a dependent object. There are ways around that, of course, but why bother?

Soft deletes can be handled just like any regular update. Then we just periodically run a garbage collector to hard-delete objects after some time.

3rodents•12m ago
Soft deletes are an example of where engineers unintentionally lead product instead of product leading engineering. Soft delete isn’t language used by users so it should not be used by engineers when making product facing decisions.

“Delete” “archive” “hide” are the type of actions a user typically wants, each with their own semantics specific to the product. A flag on the row, a separate table, deleting a row, these are all implementation options that should be led by the product.

monkpit•4m ago
Why would implementation details be led by product? “Undo” is an action that the user may want, which would be led by product. Not the implementation in the db.
LorenPechtel•10m ago
The % of records that are deleted is a huge factor.

You keep 99%, soft delete 1%, use some sort of deleted flag. While I have not tried it whalesalad's suggestion of a view sounds excellent. You delete 99%, keep 1%, move it!

clickety_clack•7m ago
We have soft delete, with hard delete running on deletions over 45 days old. Sometimes people delete things by accident and this is the only way to practically recover that.

A 26,000-year astronomical monument hidden in plain sight (2019)

https://longnow.org/ideas/the-26000-year-astronomical-monument-hidden-in-plain-sight/
307•mkmk•6h ago•62 comments

California is free of drought for the first time in 25 years

https://www.latimes.com/california/story/2026-01-09/california-has-no-areas-of-dryness-first-time...
172•thnaks•1h ago•70 comments

The challenges of soft delete

https://atlas9.dev/blog/soft-delete.html
57•buchanae•2h ago•32 comments

Instabridge has acquired Nova Launcher

https://novalauncher.com/nova-is-here-to-stay
116•KORraN•5h ago•85 comments

Inside the secret world of Japanese snack bars

https://www.bbc.com/travel/article/20260116-inside-the-secret-world-of-japanese-snack-bars
79•rmason•2h ago•50 comments

Cloudflare zero-day: Accessing any host globally

https://fearsoff.org/research/cloudflare-acme
34•2bluesc•7h ago•8 comments

Provably unmasking malicious behavior through execution traces

https://arxiv.org/abs/2512.13821
15•PaulHoule•2h ago•2 comments

The Unix Pipe Card Game

https://punkx.org/unix-pipe-game/
172•kykeonaut•7h ago•49 comments

Which AI Lies Best? A game theory classic designed by John Nash

https://so-long-sucker.vercel.app/
26•lout332•2h ago•19 comments

Ask HN: Is Linux Safe to Daily drive in 2026?

9•A_Random_Nerd•10m ago•3 comments

Electricity use of AI coding agents

https://www.simonpcouch.com/blog/2026-01-20-cc-impact/
30•linolevan•6h ago•21 comments

I'm addicted to being useful

https://www.seangoedecke.com/addicted-to-being-useful/
470•swah•13h ago•227 comments

Our approach to age prediction

https://openai.com/index/our-approach-to-age-prediction/
53•pretext•4h ago•106 comments

RCS for Business

https://developers.google.com/business-communications/rcs-business-messaging
22•sshh12•20h ago•22 comments

Running Claude Code dangerously (safely)

https://blog.emilburzo.com/2026/01/running-claude-code-dangerously-safely/
270•emilburzo•12h ago•221 comments

Show HN: Agent Skills Leaderboard

https://skills.sh
28•andrewqu•2h ago•14 comments

Show HN: Mastra 1.0, open-source JavaScript agent framework from the Gatsby devs

https://github.com/mastra-ai/mastra
72•calcsam•7h ago•28 comments

Maintenance: Of Everything, Part One

https://press.stripe.com/maintenance-part-one
56•mitchbob•5h ago•11 comments

Building Robust Helm Charts

https://www.willmunn.xyz/devops/helm/kubernetes/2026/01/17/building-robust-helm-charts.html
9•will_munn•1d ago•0 comments

Unconventional PostgreSQL Optimizations

https://hakibenita.com/postgresql-unconventional-optimizations
255•haki•9h ago•32 comments

Lunar Radio Telescope to Unlock Cosmic Mysteries

https://spectrum.ieee.org/lunar-radio-telescope
6•rbanffy•1h ago•0 comments

Are Arrays Functions?

https://futhark-lang.org/blog/2026-01-16-are-arrays-functions.html
6•todsacerdoti•1d ago•0 comments

TopicRadar – Track trending topics across Hacker News, GitHub, ArXiv, and more

https://apify.com/mick-johnson/topic-radar
13•MickolasJae•9h ago•3 comments

Dockerhub for Skill.md

https://skillregistry.io/
15•tomaspiaggio12•9h ago•10 comments

LG UltraFine Evo 6K 32-inch Monitor Review

https://www.wired.com/review/lg-ultrafine-evo-6k-32-inch-monitor/
50•tosh•3d ago•83 comments

IPv6 is not insecure because it lacks a NAT

https://www.johnmaguire.me/blog/ipv6-is-not-insecure-because-it-lacks-nat/
24•johnmaguire•5h ago•7 comments

Nvidia Stock Crash Prediction

https://entropicthoughts.com/nvidia-stock-crash-prediction
332•todsacerdoti•8h ago•280 comments

Fast Concordance: Instant concordance on a corpus of >1,200 books

https://iafisher.com/concordance/
28•evakhoury•4d ago•2 comments

Show HN: wxpath – Declarative web crawling in XPath

https://github.com/rodricios/wxpath
57•rodricios•6d ago•9 comments

Linux kernel framework for PCIe device emulation, in userspace

https://github.com/cakehonolulu/pciem
215•71bw•16h ago•76 comments