frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Are hard drives getting better?

https://www.backblaze.com/blog/are-hard-drives-getting-better-lets-revisit-the-bathtub-curve/
86•HieronymusBosch•6h ago

Comments

thisislife2•3h ago
Personal anecdote - I would say (a cautious) yes. Bought 3 WD hard drives (1 external, 2 internal, during different time periods; in the last 10+ years) for personal use and 2 failed exactly after the 5 year warranty period ended (within a month or so). One failed just a few weeks before the warranty period, and so WD had to replace it (and I got a replacement HDD that I could use for another 5 years). That's good engineering! (I also have an old 500GB external Seagate drive that has now lasted 10+ years, and still works perfectly - probably an outlier).

That said, one thing that I do find very attractive in Seagate HDDs now is that they are also offering free data recovery within the warranty period, with some models. Anybody who has lost data (i.e. idiots like me who didn't care about backups) and had to use such services knows how expensive they can be.

kvemkon•1h ago
> replacement HDD that I could use for another 5 years

But the warranty lasts only 5 years since the purchase of the drive, doesn't it?

thisislife2•1h ago
Yes, but the warranty is "irrelevant" when the drive actually last the whole 5 years (in other words, I am hoping the drive is as well-engineered as its predecessor and lasts the whole 5 years - and it has so far in the last 3+ years).
gfody•2h ago
pleasant contradiction to betteridge's law
eqvinox•2h ago
I'm curious what this data would look like collated by drive birth date rather than (or in 3D addition to) age. I wouldn't use that as the "primary" way to look at things, but it could pop some interesting bits. Maybe one of the manufacturers had a shipload of subpar grease? Slightly shittier magnets? Poor quality silicon? There's all kinds of things that could cause a few months of hard drive manufacture to be slightly less reliable…

(Also: "Accumulated power on time, hours:minutes 37451*:12, Manufactured in week 27 of year 2014" — I might want to replace these :D — * pretty sure that overflowed at 16 bit, they were powered on almost continuously & adding 65536 makes it 11.7 years.)

sdenton4•1h ago
I think it's helpful to put on our statistics hats when looking at data like this... We have some observed values and a number of available covariates, which, perhaps, help explain the observed variability. Some legitimate sources of variation (eg, proximity to cooling in the NFS box, whether the hard drive was dropped as a child, stray cosmic rays) will remain obscured to us - we cannot fully explain all the variation. But when we average over more instances, those unexplainable sources of variation are captured as a residual to the explanations we can make, given the avialable covariates. The averaging acts a kind of low-pass filter over the data, which helps reveal meaningful trends.

Meanwhile, if we slice the data up three ways to hell and back, /all/ we see is unexplainable variation - every point is unique.

This is where PCA is helpful - given our set of covariates, what combination of variables best explain the variation, and how much of the residual remains? If there's a lot of residual, we should look for other covariates. If it's a tiny residual, we don't care, and can work on optimizing the known major axes.

burnished•1h ago
Well said, and made me want to go review my stats text.
tanvach•1h ago
I find it more straight forward to just model the failure rate with the variables directly, and look metrics like AUC for out of sample data.
mnw21cam•1h ago
I personally am looking forward to BackBlaze inventing error bars and statistical tests.
IgorPartola•52m ago
Exactly. I used to pore over the Backblaze data but so much of it is in the form of “we got 1,200 drives four months ago and so far none have failed”. That is a relatively small number over a small amount of time.

On top of that it seems like by the time there is a clear winner for reliability, the manufacturer no longer makes that particular model and the newer models are just not a part of the dataset yet. Basically, you can’t just go “Hitachi good, Seagate bad”. You have to look at specific models and there are what? Hundreds? Thousands?

Animats•1h ago
Right. Does the trouble at year 8 reflect bad manufacturing 8 years ago?
dylan604•50m ago
Honestly, at 8 years, I'd be leaning towards dirty power on the user's end. For a company like BackBlaze, I'd assume a data center would have conditioned power. For someone at home running a NAS with the same drive connected straight to mains, they may not receive the same life span for a drive from the same batch. Undervolting when the power dips is gnarly on equipment. It's amazing to me how the use of a UPS is not as ubiquitous at home.
tanvach•1h ago
Agreed, these type of analyses benefit from grouping by cohort years. Standard practice in analytics.
dylan604•56m ago
Over the past couple of years, I've been side hustling a project that requires buying ingredients from multiple vendors. The quantities never work out 1:1, so some ingredients from the first order get used with some from a new order from a different vendor. Each item has its own batch number which when used together for the final product yields a batch number on my end. I logged my batch number with the batch number for each of the ingredients in my product. As a solo person, it is a mountain of work, but nerdy me goes to that effort.

I'd assume that a drive manufacture does similar knowing which batch from which vendor the magnets, grease, or silicon all comes from. You hope you never need to use these records to do any kind of forensic research, but the one time you do need it makes a huge difference. So many people doing similar products that I do look at me with a tilted head while their eyes go wide and glaze over as if I'm speaking an alien language discussing lineage tracking.

calvinmorrison•37m ago
Yes this is fairly standard in manufacturing environments. builds of material and lot or down to serial # level are tracked for production of complex goods.
realityfactchex•1h ago
Per charts in TFA, it looks like some disks are failing less overall, and failing after a longer period of time.

I'm still not sure how to confidently store decent amounts of (personal) data for over 5 years without

  1- giving to cloud,
  2- burning to M-disk, or
  3- replacing multiple HDD every 5 years on average
All whilst regularly checking for bitrot and not overwriting good files with bad corrupted files.

Who has the easy, self-service, cost-effective solution for basic, durable file storage? Synology? TrueNAS? Debian? UGreen?

(1) and (2) both have their annoyances, so (3) seems "best" still, but seems "too complex" for most? I'd consider myself pretty technical, and I'd say (3) presents real challenges if I don't want it to become a somewhat significant hobby.

LunaSea•1h ago
Would tapes not be an option?

Not great for easy read access but other than that it might be decent storage.

dmoy•1h ago
I looked at tape a little while ago and decided it wasn't gonna work out for me reliability-wise at home without a more controlled environment (especially humidity).
gruez•1h ago
>Would tapes not be an option?

AFAIK someone on reddit did the math and the break-even for tapes is between 50TB to 100TB. Any less and it's cheaper to get a bunch of hard drives.

ghaff•1h ago
Unless you're basically a serious data hoarder or otherwise have unusual storage requirements, an 18TB drive (or maybe 2) get you a lot of the way to handling most normal home requirements.
thisislife2•1h ago
Tapes would be great for backups - but the tape drive market's all "enterprise-y", and the pricing reflects that. There really isn't any affordable retail consumer option (which is surprising as there definitely is a market for it).
palmotea•1h ago
> 2- burning to M-disk, or

You can't buy those anymore. I've tried.

IIRC, the things currently marketed as MDisc are just regular BD-R discs (perhaps made to a higher standard, and maybe with a slower write speed programmed into them, but still regular BD-Rs).

Gigachad•1h ago
Hard drive failure seems like more of a cost and annoyance problem than a data preservation issue. Even with incredible reliability you still need backups if your house burns down. And if you have a backup system then drive failure matters little.
ssl-3•1h ago
One method that seems appealing:

1. Use ZFS with raidz

2. Scrub regularly to catch the bitrot

3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)

4. Bring over pizza or something from time to time.

As to brands: This method is independent of brand or distro.

fun444555•1h ago
This works great although I should really do step 4 :)
gruez•1h ago
>3. Park a small reasonably low-power computer at a friend's house across town or somewhere a little further out -- it can be single-disk or raidz1. Send ZFS snapshots to it using Tailscale or whatever. (And scrub that regularly, too.)

Unless you're storing terabyte levels of data, surely it's more straightforward and more reliable to store on backblaze or aws glacier? The only advantage of the DIY solution is if you value your time at zero and/or want to "homelab".

ghaff•1h ago
I'd much rather just have a backblaze solution and maybe redundant local backups with Time Machine or your local backup of choice (which work fine for terabytes at this point). Maybe create a clone data drive and drop it off with a friend every now and then which should capture most important archive stuff.
ssl-3•1h ago
A chief advantage of storing backup data across town is that a person can just head over and get it (or ideally, a copy of it) in the unlikely event that it becomes necessary to recover from a local disaster that wasn't handled by raidz and local snapshots.

The time required to set this stuff up is...not very big.

Things like ZFS and Tailscale may sound daunting, but they're very light processes on even the most garbage-tier levels of vaguely-modern PC hardware and are simple to get working.

bigiain•45m ago
I have a simpler approach that I've used at home for about 2 decades now pretty much unchanged.

I have two raid1 pairs - "the old one", and "the new one", plus a third drive the same sizes as "the old pair". The new pair is always larger than the old pair, in the early days it was usually well over twice as big but drive growth rates have slowed since then. About every three years I buy a new "new pair" + third drive, and downgrade the current "new pair" to be the4 "old pair". The old pair is my primary storage, and gets rsynced to a partition that's the same size on the new pair. Te remainder of the new pair is used for data I'm OK with not being backed up (umm, all my BitTorrented Linux isos...) The third drive is on a switched powerpoint and spins up late Sunday night and rsyncs the data copy on the new pair then powers back down for the week.

toast0•1h ago
If you don't have too much stuff, you could probably do ok with mirroring across N+1 (distributed) disks, where N is enough that you're comfortable. Monitor for failure/pre-failure indicators and replace promptly.

When building up initially, make a point of trying to stagger purchases and service entry dates. After that, chances are failures will be staggered as well, so you naturally get staggered service entry dates. You can likely hit better than 5 year time in service if you run until failure, and don't accumulate much additional storage.

But I just did a 5 year replacement, so I dunno. Not a whole lot of work to replace disks that work.

IgorPartola•47m ago
Get yourself a Xeon powered workstation that supports at least 4 drives. One will be your boot system drive and three or more will be a ZFS mirror. You will use ECC RAM (hence Xeon). I bought a Lenovo workstation like this for $35 on eBay.

ZFS with a three way mirror will be incredibly unlikely to fail. You only need one drive for your data to survive.

Then get a second setup exactly like this for your backup server. I use rsnapshot for that.

For your third copy you can use S3 like a block device, which means you can use an encrypted file system. Use FreeBSD for your base OS.

aftergibson•1h ago
Not from the prices I'm seeing.
bigwheels•18m ago
Recent and related:

Disk Prices https://news.ycombinator.com/item?id=45587280 - 1 day ago, 67 comments

hatmatrix•1h ago
Do we have enough rare earth metals to provide storage for the AI boom?
jpitz•1h ago
The question is, do we have enough capacity to mine and refine them at a reasonable price? They're there, in the dirt for the taking.
asdff•1h ago
Future generations will blame us for damning them out of rare earths to build yet another cellphone. This is like us today with severely diminished whale populations just so Victorians could read the bible for another 2 hours a night. Was it worth it? Most would say no, save for the people who made a fortune off of it I'm sure.
yalogin•1h ago
Ah I haven’t seen the yearly backblaze post in some time now, glad it’s back.
fooker•1h ago
Hard drives are not getting better.

Hard drives you can conveniently buy as a consumer - yes. There's a difference.

billfor•30m ago
I have a 13 years old NAS with 4x1TB consumer drives with over 10y head flying hours and 600,000 head unloads. Only 1 drive failed at around 7 years. The remaining 3 are still spinning and pass the long self test. I do manually set the hdparm -B and -S to balance head flying vs unloads. I'm kinda of hoping the other drives will fail so I can get a new NAS but no such luck yet :-(
kiddico•18m ago
I admire the "use it until it dies" lifestyle. My NAS is at 7 years and I have no plans to upgrade anytime soon!

I Hate Acrobat

https://www.vincentuden.xyz/blog/pdf-reader
72•vincent-uden•1h ago•49 comments

Apple M5 chip

https://www.apple.com/newsroom/2025/10/apple-unleashes-m5-the-next-big-leap-in-ai-performance-for...
885•mihau•10h ago•985 comments

Claude Haiku 4.5

https://www.anthropic.com/news/claude-haiku-4-5
379•adocomplete•6h ago•161 comments

How First Wap Tracks Phones Around the World

https://www.lighthousereports.com/methodology/surveillance-secrets-explainer/
14•mattboulos•50m ago•0 comments

Next Steps for the Caddy Project Maintainership

https://caddy.community/t/next-steps-for-the-caddy-project-maintainership/33076
44•francislavoie•1h ago•6 comments

I almost got hacked by a 'job interview'

https://blog.daviddodda.com/how-i-almost-got-hacked-by-a-job-interview
658•DavidDodda•10h ago•350 comments

Bringing NumPy's type-completeness score to nearly 90% – Pyrefly

https://pyrefly.org/blog/numpy-type-completeness/
21•todsacerdoti•1w ago•5 comments

Are hard drives getting better?

https://www.backblaze.com/blog/are-hard-drives-getting-better-lets-revisit-the-bathtub-curve/
86•HieronymusBosch•6h ago•40 comments

Pwning the Nix ecosystem

https://ptrpa.ws/nixpkgs-actions-abuse
223•SuperShibe•9h ago•39 comments

Show HN: Halloy – Modern IRC client

https://github.com/squidowl/halloy
262•culinary-robot•11h ago•73 comments

Monads are too powerful: The expressiveness spectrum

https://chrispenner.ca/posts/expressiveness-spectrum
41•hackandthink•3d ago•36 comments

Recursive Language Models (RLMs)

https://alexzhang13.github.io/blog/2025/rlm/
51•talhof8•5h ago•13 comments

Zed is now available on Windows

https://zed.dev/blog/zed-for-windows-is-here
104•meetpateltech•6h ago•29 comments

F5 says hackers stole undisclosed BIG-IP flaws, source code

https://www.bleepingcomputer.com/news/security/f5-says-hackers-stole-undisclosed-big-ip-flaws-sou...
115•WalterSobchak•9h ago•57 comments

Leaving serverless led to performance improvement and a simplified architecture

https://www.unkey.com/blog/serverless-exit
272•vednig•11h ago•173 comments

A kernel stack use-after-free: Exploiting Nvidia's GPU Linux drivers

https://blog.quarkslab.com/./nvidia_gpu_kernel_vmalloc_exploit.html
118•mustache_kimono•9h ago•10 comments

ImapGoose

https://whynothugo.nl/journal/2025/10/15/introducing-imapgoose/
4•xarvatium•49m ago•0 comments

Recreating the Canon Cat document interface

https://lab.alexanderobenauer.com/updates/the-jasper-report
78•tonyg•8h ago•6 comments

Princeton Engineering Anomalies Research

https://pearlab.icrl.org/
21•walterbell•1w ago•4 comments

Garbage collection for Rust: The finalizer frontier

https://soft-dev.org/pubs/html/hughes_tratt__garbage_collection_for_rust_the_finalizer_frontier/
102•ltratt•11h ago•105 comments

Reverse engineering a 27MHz RC toy communication using RTL SDR

https://nitrojacob.wordpress.com/2025/09/03/reverse-engineering-a-27mhz-rc-toy-communication-usin...
72•austinallegro•8h ago•17 comments

C++26: range support for std:optional

https://www.sandordargo.com/blog/2025/10/08/cpp26-range-support-for-std-optional
68•birdculture•5d ago•55 comments

Gerald Sussman - An Electrical Engineering View of a Mechanical Watch (2003)

https://techtv.mit.edu/videos/15895-an-electrical-engineering-view-of-a-mechanical-watch
8•o4c•1w ago•0 comments

Ask HN: Can't get hired – what's next?

12•silvercymbals•25m ago•7 comments

Things I've learned in my 7 years implementing AI

https://www.jampa.dev/p/llms-and-the-lessons-we-still-havent
115•jampa•4h ago•41 comments

Americans' love of billiards paved the way for synthetic plastics

https://invention.si.edu/invention-stories/imitation-ivory-and-power-play
52•geox•6d ago•27 comments

M5 MacBook Pro

https://www.apple.com/macbook-pro/
291•tambourine_man•10h ago•399 comments

Pixnapping Attack

https://www.pixnapping.com/
285•kevcampb•17h ago•67 comments

Helpcare AI (YC F24) Is Hiring

1•hsial•11h ago

The brain navigates new spaces by 'darting' between reality and mental maps

https://medicine.yale.edu/news-article/brain-navigates-new-spaces-by-flickering-between-reality-a...
123•XzetaU8•1w ago•44 comments