frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

My first in-prod corrupted hard drive problem

https://blog.pavementlink.ch/2026/05/07/my-first-corrupted-hard-drive-problem/
15•r1chk1t•1h ago

Comments

Retr0id•49m ago
> So how were we able to recover the database and the data inside it? Most of the data was probably still intact, only a few sectors were unreadable. Once those were either restored (rewritten with a strong signal) or remapped by the drive’s firmware, the filesystem and the database engine could read the file end-to-end again. SQL Server pages also have checksums, so if any page came back wrong rather than unreadable, we’d have known. We got lucky: the corruption was at the magnetic-signal level, not at the “platter is scratched” level.

This doesn't quite seem to follow. As described, neither of the "recovery" methods actually restore lost data. So why weren't any of the SQL pages left in a bad state?

benlivengood•43m ago
As best as I can tell it was intermittent read failures on some sectors, not permanent failures.

So if you keep rereading that section of the disk you eventually get all the data, save it somewhere, write a bunch of new patterns over it, then write the original data and verify it reads back correctly many times.

I believe the article's analysis about RAID is wrong though; most controllers will start resilvering or just fail a drive once it experiences too many IO errors.

jtchang•45m ago
Confused as to the actual root cause. Don't all hard drives provide SMART diagnostics these days? Was it really bad sectors?
r1chk1t•8m ago
Yes there was bad sectors in the SMART diagnostic
pixel_popping•35m ago
I feel the pain OP.

Over the last decade, I've ran hundreds of servers if not thousands, and I entirely stopped using hard drives, now it's solely SSD/NVMe where the failure rate in practice is incredibly lower, I've had my fair share of middle-night runs because websites are offline or whatever to end-up in a hard drive diagnosis circus.

Imo, the peace of mind you get worth the cost, it also allows you to rethink development entirely, typical example would be that suddenly, copying all node_modules or rust deps is a great idea with 10Gbit/s bandwidth and fast drives (yes, I expect people to shit on me for saying this, please give me the counterarguments if you downvote me), many things change if you have a higher base performance assumption, storage is relatively cheap as well. I would never advise anyone that wants to run continuously in prod with low friction to get servers with HDD.

I get that for some use cases it's not possible, but for large majority of use cases, it's clearly not HDD that is the cost burden. $50 servers gets you TBs of SSD, of course don't go with VPS or "Cloud" if you intend to change your development based on new performance assumptions, it blows my mind the numbers of people paying thousand of dollars just to handle what, 100K visitors a day? That fits on a $100 server and a bunch of Kimsufi hosted across the world as a CDN.

People are overcomplicating infrastructure, big time (which leads to more problems, higher maintenance, security issues and so on).

Retr0id•31m ago
It is quite remarkable how quickly a modern SSD can scan over TBs of data, I'm less afraid of O(n) queries than I used to be.
toast0•19m ago
> Over the last decade, I've ran hundreds of servers if not thousands, and I entirely stopped using hard drives, now it's solely SSD/NVMe where the failure rate in practice is incredibly lower, I've had my fair share of middle-night runs because websites are offline or whatever to end-up in a hard drive diagnosis circus.

My experience is that (most) spinners give off reliable pre-failure indicators (if you take the time to look/script looking), but SSDs fail by disappearing from the bus. The SSDs do fail much less often, but they still fail from time to time and recovery is harder.

Either way, if your data is important to you/your customers, you really need a backup/recovery plan.

I dunno about recent pricing, but not so long ago, it felt like spinners had a pretty high price floor and SSDs didn't... If you don't need a lot of space, you could find a small SSD that was still around the same $/GB as a medium sized SSD, but for spinners, there's a floor in dollars and space. So if you don't need a lot of space, you save money with an SSD and get better perf for free... If you need a lot of space and not a lot of perf, big spinners are more attainable than big SSDs.

ryandrake•10m ago
> My experience is that (most) spinners give off reliable pre-failure indicators (if you take the time to look/script looking), but SSDs fail by disappearing from the bus. The SSDs do fail much less often, but they still fail from time to time and recovery is harder.

I'm not a pro, just a smalltime dork with a homelab. I use cheap WD HDDs on my NAS system connected to an LSI hardware RAID controller. I'll boast that I have a 100% record so far of preventing downtime and data loss by simply listening for the controller's audible alarm and swapping drives right away (I keep brand new spares). I also have offline backups, but have so far never needed them. Not sure how this would change if I moved to SSDs.

pixel_popping•3m ago
Agree with the diagnostic part.

> Either way, if your data is important to you/your customers, you really need a backup/recovery plan.

You'd be surprised at how many devs/companies walk on eggshells all the time (praying that the fatal moment never arrive) because they aren't "brave" enough to do a proper backup system, which is often few minutes/hours of setup only.

pshirshov•11m ago
So, you were not using a striped mirror ZFS for a prod database? What could go wrong, yep.
r1chk1t•8m ago
learned the hard way
proactivesvcs•2m ago
I'm surprised to have read to the end and found that they're still not performing any hardware monitoring and alerting. SMART may not always show up pre-failure warnings but when it does they can usually be trusted.

AI's Big Messaging Pivot

https://www.noahpinion.blog/p/ais-big-messaging-pivot
1•paulpauper•19s ago•0 comments

Could development economics be more useful?

https://www.noahpinion.blog/p/could-development-economics-be-more
1•paulpauper•1m ago•0 comments

A simple point about diversification

https://marginalrevolution.com/marginalrevolution/2026/05/a-simple-point-about-diversification.html
1•paulpauper•2m ago•0 comments

Dirty Frag: Universal Linux LPE

https://github.com/V4bel/dirtyfrag
1•unbeli•3m ago•0 comments

Digg Is Back (Again)

2•basket278•4m ago•0 comments

NocTUI – Lightweight C Library for Building Terminal User Interfaces (TUIs)

https://github.com/UsboKirishima/noctui
1•333revenge•5m ago•0 comments

Real-Time Vibrotactile Stimulation and Inter-Brain Connectivity in Partner Dance

https://dl.acm.org/doi/10.1145/3731459.3773332
1•bookofjoe•5m ago•0 comments

Arena Physica

https://www.arenaphysica.com
1•skogstokig•6m ago•0 comments

Notes on Tanya M. Luhrmann's Book 'How God Becomes Real'

https://michaelnotebook.com/luhrmann/index.html
1•benbreen•7m ago•0 comments

Divorce Rates by Occupation

https://flowingdata.com/2026/05/07/divorce-and-occupation-2026/
3•tevon•13m ago•0 comments

Internet Archive Switzerland: Expanding a Global Mission to Preserve Knowledge

https://blog.archive.org/2026/05/06/internet-archive-switzerland-expanding-a-global-mission-to-pr...
3•rbanffy•14m ago•0 comments

Everything Vault – a local-first Markdown knowledge system for LLMs

https://github.com/AntlerForge/everything-vault
3•AntlerForge•16m ago•0 comments

From MemSQL to HorizonDB, an Engineer's Journey with Adam Prout

https://talkingpostgres.com/episodes/from-memsql-to-horizondb-an-engineers-journey-with-adam-prout
1•clairegiordano•20m ago•0 comments

When is your birthday? – The Math Behind Hash Collisions

https://0xkrt26.github.io/math_behind_security/2026/05/08/birthday-problem.html
1•denismenace•21m ago•0 comments

Beyond Human Syntax – The Logic of Future Coding Agents

https://www.thebigdatablog.com/nela-beyond-human-syntax-the-logic-of-future-coding-agents/
3•heikowag•22m ago•0 comments

AI, the Poor, and the Ignorant

https://user8.bearblog.dev/ai-the-poor-and-the-ignorant/
1•James72689•23m ago•0 comments

Using Claude Code: The Unreasonable Effectiveness of HTML

https://twitter.com/trq212/status/2052809885763747935
3•tchalla•25m ago•0 comments

Real-time collaboration will not ship in WordPress 7.0

https://make.wordpress.org/core/2026/05/08/rtc-removed-from-7-0/
1•pentagrama•26m ago•0 comments

A 3D explorer of the Bitcoin blockchain

https://blockparty-omega.vercel.app/
1•dca_mindset•27m ago•0 comments

1k-year-old archaeological site bulldozed during construction of border wall

https://www.theartnewspaper.com/2026/05/05/border-wall-construction-bulldozes-archaeological-site
1•YeGoblynQueenne•28m ago•0 comments

Félix Guattari – The Image Machine (1990)

https://www.e-flux.com/notes/6783490/the-image-machine
2•bondarchuk•28m ago•0 comments

Frontier models refuse to help organizers, so we built our own activist AI

https://www.outcryai.com/research/how-to-create-activist-ai
2•micahwhite•28m ago•0 comments

Rolo: Relationship Intelligence Tool

https://rolo.agentschool.io/
2•amahjoor•29m ago•0 comments

If You Read One Screenwriting Book, Read This

https://jamesgarside.substack.com/p/if-you-read-one-screenwriting-book
1•monkeymagick•30m ago•0 comments

Classification of Amino Acids

https://www.khanacademy.org/test-prep/mcat/chemical-processes/amino-acids-peptides-proteins-5d/v/...
1•kamaraju•32m ago•0 comments

A New Era of Security: Frontier AI Defense

https://www.paloaltonetworks.com/blog/2026/05/frontier-ai-defense/
1•yusufozkan•33m ago•0 comments

She had four kids with Elon Musk. Now she's central to his courtroom fight

https://www.washingtonpost.com/technology/2026/05/08/shivon-zilis-elon-musk-trial/
3•1vuio0pswjnm7•34m ago•1 comments

Notes from Code with Claude 2026

https://chrisebert.net/notes-from-code-with-claude-2026/
2•rmason•34m ago•0 comments

The nightmare of changing your internet bundle

https://laze.net/2026/05/07/the-nightmare-of-changing-your.html
1•speckx•36m ago•0 comments

Bring your own init: PID 1 handoff

https://microsandbox.dev/blog/bring-your-own-init
1•toksdotdev•38m ago•0 comments