>Any guidelines for what's a good used enterprise SSD?
Look at the sellers other items. You want them to be data-center stuff.
Look at how many they have to sell - someone clearing out a server won't have 1 drive, they'll have half a dozen plus.
Look for smart data in the post / guaranteed minimum health.
I mostly bought S3500/3600/3700 series intel SSDs. The endurance numbers vary so you'll need to look up what you find
>The endurance numbers on the enterprise drives are just so much higher.
That plus I'm more confident they'll actually hit them
Micron, Samsung and Intel (enterprise, branded DCxxxx) / SK Hynix / Solidigm (Intel sold it's SSD business to SK which they merged into Solidigm) are the go to's for brands. HGST can also be good.
The best guideline is buying from reputable sellers with a decent volume of business (eg, >1000 sales with high ratings on ebay) that focus on enterprise hardware and have a decent return/DOA policy.
You should expect these drives to be partially worn (regardless of the SMART data, that often gets wiped) if for no other reason than the secure erasure process mandated by a lot of org's data security policies resulting in multiple intensive disk writes, but also due to actually having been used. Drives that have recently been released (within 12 months, eg Micron 7600) are suspect as that implies there was a bad batch or that they were misused - especially if they aren't write focused drives. Not uncommon for a medium to smaller-end large business pinching pennies and buying the wrong drives and then wrecking them and their vendor/VAR rejecting warranty claims. That said, that's not always the case, it's entirely possible to get perfectly good recently made drives from reputable 2nd hand market sellers, just don't expect a massive discount in that case.
Otherwise best advice I can give you, is redundancy is your friend. If you can't afford buy at least 2 drives for an array, you should probably stick to buying new. I've had a few lemons over the years, but since availability on the second hand market for any given model can be variable and you tend to want to build arrays from like-devices, you should purchase them with the expectation that at least 1 per batch will be bad just to be safe. Worst case scenario you end up with an extra drive/hotspare.
I'd rather see 3Pb of 5Pb writes used than an obviously pretend 2Gib written.
If you're US domestic market, then yeah, you can usually avoid Chinese vendors. If you're EU or elsewhere, China can often be the main/only source of affordable drives vs domestic market. Really depends (I don't shop for international buds/clients, but I constantly hear about how the homelabbers across the pond have significantly higher prices/lower availability for surplus enterprise gear in general)
Stick to the rules on reliable vendors with a return policy, buy in bulk with the expectation that some will be bad (not a given, but good to be prepared), and the only issue from buying from china is delayed shipping times.
My hunch is that they don't expose anything because that makes it harder to refund on warranty
- Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.
- Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block.
- A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs.
- Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive.
- per the NVMe spec there are indicators of drive health in the SMART log page.
- Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc.
> - Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.
This is true, but despite all of the controversy about this feature it’s hard to encounter this in practical consumer use patterns.
With the 980 Pro 1TB you can write 113GB before it slows down. (Source https://www.techpowerup.com/review/samsung-980-pro-1-tb-ssd/... ) So you need to be able to source that much data from another high speed SSD and then fill nearly 1/8th of the drive to encounter the slowdown. Even when it slows down you’re still writing at 1.5GB/sec. Also remember that the drive is factory overprovisioned so there is always some amount of space left to handle some of this burst writing.
For as much as this fact gets brought up, I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations, but even in slow mode you’re filling the entire drive capacity in under 10 minutes.
This has always been the case, thus why even a decade ago the “pro” drives were odd sizes like 120g vs 128g.
Products like that still exist today and the problem tends to show up as drives age and that pool shrinks.
DWPD and TB written like modern consumer drives use are just different ways of communicating that contract.
FWIW I’d you do a drive wide discard and then only partition 90% of the drive you can dramatically improve the garbage collection slowdown on consumer drives.
In the world of ML and containers you can hit that if you say have fstrim scheduled once a week to avoid the cost of online discards.
I would rather have visibility into the size of the reserve space through smart, but I doubt that will happen.
I think it is safe to say that all drives have this. Refer to the available spare field in the SMART log page (likely via smartctl -a) to see the percentage of factory overprovisioned blocks that are still available.
I hypothesize that as this OP space dwindles writes get slower because they are more likely to get bogged down behind garbage collection.
> I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations
I agree. I agree so much that I question the assertion that drive slowness is a major factor in machines feeling slow. My slow laptop is about 5 years old. Firefox spikes to 100+% CPU for several seconds on most page loads. The drive is idle during that time. I place the vast majority of the blame on software bloat.
That said, I am aware of credible assertions that drive wear has contributed to measurable regression in VM boot time for a certain class of servers I’ve worked on.
113GB is pretty easily reached with video files.
> you’re still writing at 1.5GB/sec.
Except of few seconds at the start, the whole process lasts as if you had PCIe 2.0 (15+ years ago). Having so fast SSDs there is no chance to make a quick backup/restore. And during restore you're second time in a row too slow.
It's crazy that instead of using slow PLC at the time of slow PCIe 1.0, back then fast SLC was in use. Now with PCIe 5.0 when you really need fast SLC, you get slow TLC or very slow QLC or even worse PLC coming.
I'm also not a fan of buy bigger storage concept, or the conspiracy-theory on 480 v 512.
It sure would be nice if when considering a product, you could just look at some claimed stats from the vendor about time-related degradation, firmware sparing policy, etc. we shouldn't have to guess!
I don't understand why this is being called a "conspiracy theory"; but, if you want some very concrete evidence that this is how they work, a paper was recently published that analyzed the behavior and endurance of various SSDs, and the data would be very difficult to describe using any other theory than that, comparing apples-to-apples on drives that have better write endurance, they are merely overprovisioned to allow the wear-level algorithm to not cause as much write amplification while reorganizing.
https://news.ycombinator.com/item?id=44985619
> OP on write-intensive SSD. SSD vendors often offer two versions of SSDs with similar hardware specifications, where the lower-capacity model is typically marketed as “write-optimized” or “mixed-use”. One might expect that such write-optimized SSDs would demonstrate improved WAF characteristics due to specialized internal designs. To investigate this, we compared two Micron SSD models: the Micron 7450 PRO, designed for “read-intensive” workloads with a capacity of 960 GB, and the Micron 7450 MAX, intended for “mixed-use” workloads with a capacity of 800 GB. Both SSDs were tested under identical workloads and dataset sizes, as shown in Figure 7b. The WAF results for both models were identical and closely matched the results from the simulator. This suggests that these Micron SSDs, despite being marketed for different workloads, are essentially identical in performance, with the only difference being a larger OP on the “mixed-use” model. For these SSD models, there appear to be no other hardware or algorithmic improvements. As a result, users can achieve similar performance by manually reserving free space on the “read-intensive” SSD, offering a practical alternative to purchasing the “mixed-use” model.
Yes.
You need years from that SSD? Buy a drive with DWPD > 3.
You are a cheap ass and have the money only for a DWPD 0.3 drive? Replace it every year.
You are not sure what your usage would be? Over-provision by buying a bigger drive than you need.
And while we are at it: no, leaving >= 25% of the drive empty for the drives > 480GB is just idiotic. Either buy a bigger drive or use a common sense - even 10% of a 480GB drive is 48Gb already, for a 2048GB drive it's 204GB.
In consumer drives. Often not even a hardware failure, but a firmware one, but to most consumers, this is splitting hairs as the drive is still "Dead" as the common ingress points to fix this are not present/disabled on consumer class drives (thus the blurb at the end of that section about physically swapping controllers). Also, cell failure is far more prevalent than controller failure in instances where the drives lack a DRAM/SLC cache (aka transition flash) layer. Controllers still fail, even at the hardware level, for enterprise and consumers alike though, it's a prevalant issue (pro tip, monitor and rectify the thermals and the prevalence of this problem drops significantly)
> Failure to retain charge: typically, only seen in SSDs, thumb drives, and similar devices left unpowered for long periods of time.
Also happens to flash that see lots of writes, power cycles, or frequent significant temperature fluctuations. This is more common on portable media (thumb drives) or mobile devices (phones, laptops, especially thin ones)
> Now, let’s take a look at the DC600M Series 2.5” SATA Enterprise SSD datasheet for one of my favorite enterprise-grade drives: Kingston’s DC600M.
Strange choice of drive but okay, especially considering they don't talk about any of it's features that actually make it an enterprise version as opposed to their consumer alternatives: Power loss protection, Transition flash/DRAM cache, controller and diagnostics options, etc etc.
> Although Kingston’s DC600M is 3D TLC like Samsung’s EVO (and newer “Pro”) models, it offers nearly double the endurance of Samsung’s older MLC drives, let alone the cheaper TLC! What gives?
For starters the power regulation and delivery circuitry on entrprise grade drives tends to be more robust (usually, even on a low-end drive like the DC600M), so that those writes that wear the cells are much less likely to actually cause wear due to out-of-spec voltage/amps. Their flash topology, channels, bitwidths, redundancy (for wear levelling/error correction) etc etc are also typically significantly improved. all of these things are FAR more important than the TLC/SLC/MLC discussion they dive into. None of these things are a given just because someone brands it an "Enterprise drive" but these are things that enterprises are concerned with where consumers typically don't often have workloads where such considerations really make a meaningful difference and they can just use either DWPD or brute force by vastly overbuying capacity to evaluate what works for them.
> One might, for example, very confidently expect 20GB per day to be written to a LOG vdev in a pool with synchronous NFS exports, and therefore spec a tiny 128GB consumer SSD rated for 0.3 DWPD... On the surface, this seems more than fine:
Perhaps, but let me stop you right there as the math that follows is irrelevant for the context presented. You should be asking what kind of DRAM/Transition flash (typically SLC if not DRAM) is present in the drive and how the controller handles it (also if it has PLP) before you ever consider DWPD. If your (S)LOG's payloads fit within the controllers cache size, and that's it's only meaningful workload then 0.3DWPD is totally fine as the actual NAND cells that comprise the available capacity will experience much less wear than if there were no cache present on the drive.
Furthermore, regardless of specific application, if your burstable payloads exceed whatever cache layer your drive can handle, you're going to see much more immediate performance degradation entirely independent of wear on any of your components. This is one area that significantly separates consumer flash with enterprise flash, not QLC/TLC/MLC or how many 3d stacks of it there are. That stuff IS relevant, but it's equally relevant in enterprise and consumer, and is first and foremost a function of cost and capacity than endurance, performance, or anything else.
This is an example of how DWPD is a generic that can be broadly used, but when you get into the specifics of use, can kinda fall on it's face.
Thermals are also very important to endurance/wear and performance both, and often goes overlooked/misunderstood.
DWPD is not as important as it once was when flash was expensive, drive capacity limited, and their was significantly more overhead in scaling them up (to vastly oversimplify, a lot less PCIe lanes available), but it's still a valuable metric. And like any individual metric, in isolation it can only tell you so much, and different folks/context will have different constraints and needs.
Note, kudos for them bringing it up that not all DWPD is equal. Some report DWPD endurance over 3 years instead of 5 to artificially inflate their DWPD metric, something to be aware of.
TL;DR: DWPD, IOPs, Capacity and price are all perfectly valid ways to evaluate flash drives, especially in the consumer space. As your concerns get more specific/demanding/"enterprise", they come with more and more caveats/nuance, but that's true of any metric for any device tbh.
igtztorrero•2mo ago
Happened to me last week.
I just put it in a plastic bag into the freezer during 15 minutes, and works.
I made a copy to my laptop and then install a new server.
But not always works like charms.
Please always have a backup for documents, and a recent snapshot for critical systems.
lvl155•2mo ago
zamadatix•2mo ago
And regularly test restores actually work, nothing worse than thinking you had backups and then they don't restore right.
serf•2mo ago
drive controllers on HDDs just suddenly go to shit and drop off buses, too.
I guess the difference being that people expect the HDD to fail suddenly whereas with a solid state device most people seem to be convinced that the failure will be graceful.
PunchyHamster•2mo ago
Usually it either starts returning media errors, or slows down (and if it is not replaced in time, slowing down drive usually turns into media error one).
SSDs (at least a big fleet of samsung ones we had) are much worse, just off, not even turning readonly. Of course we have redundancy so it's not really a problem, but if same happened on someone's desktop they'd be screwed if they don't have backups.
toast0•2mo ago
This is exactly the opposite of my lived experience. Spinners fail more often than SSDs, but I don't remember any sudden failures with spinners, as far as I can recall, they all have pre-failure indicators, like terrible noises (doesn't help for remote disks), SMART indicators, failed read/write on a couple sectors here and there, etc. If you don't have backups, but you notice in a reasonable amount of time, you can salvage most of your data. Certainly, sometimes the drives just won't spin up because of a bearing/motor issue; but sometimes you can rotate the drive manually to get it started and capture some data.
The vast majority of my SSD failures have been disappear from the bus; lots of people say they should fail read only, but I've not seen it. If you don't have backups, your data is all gone.
Perhaps I missed the pre-failure indicators from SMART, but it's easier when drives fail but remain available for inspection --- look at a healthy drive, look at a failed drive, see what's different, look at all your drives, predict which one fails next. For drives that disappear, you've got to read and collect the stats regularly and then go back and see if there was anything... I couldn't find anything particularly predictive. I feel disappear from the bus is more in the firmware error category vs physical storage problem, so there may not be real indications, unless it's a power on time based failure...
jandrese•2mo ago
toast0•2mo ago
The ones for relocated sectors, pending sectors, etc. When those add up to N, it's time to replace and you can calibrate that based on your monitoring cycle and backup needs. For a look every once in a while, single copy use case, I'd replace around 10 sectors; for daily monitoring, multiple copies, I'd replace towards 100 sectors. You probably won't get warranty coverage at those numbers though.
Mostly I've only seen the smart status warning fire for too many power on hours, which isn't very useful. Power on hours isn't a good indicator of impending doom (unless there's a firmware error at specific values, which can happen for SSDs or spinners)
seanw444•2mo ago
> The vast majority of my SSD failures have been disappear from the bus; lots of people say they should fail read only, but I've not seen it. If you don't have backups, your data is all gone.
I just recovered data a couple weeks ago from my boss's SATA SSD that gave out and went read-only.
magicalhippo•2mo ago
I've had a fair numbet of HDDs throughout the years. My first one, well my dad's, was a massive 20 MB. I've had a 6+ disk ZFS pool going 24/7 since 2007. Oldest disks had over 7 years on-time according to SMART data, replaced them due to capacity.
Out of all that I've only had one HDD go poof gone. The infamous IBM Deathstar[1].
I've had some develop a few bad blocks and that's it, and one which just got worse and worse. But only one which died a sudden death.
Meanwhile I've had multiple SSDs which just stopped working suddenly. Articles write about them going into read-only mode but the ones I've had that went bad just stopped working.
[1]: https://en.wikipedia.org/wiki/Deskstar#IBM_Deskstar_75GXP_fa...
jandrese•2mo ago
I have had a few drives go completely read only on me, which is always a surprise to the underlying OS when it happens. What is interesting is you can't predict when a drive might go read-only on you. I've had a system drive that was only a couple of years old and running on a lightly loaded system claim to have exhausted the write endurance and go read only, although to be fair that drive was a throwaway Inland brand one I got almost for free at Microcenter.
If you really want to see this happen try setting up a Raspberry Pi or similar SBC off of a micro-SD card and leave it running for a couple of years. There is a reason people who are actually serious about those kinds of setups go to great lengths to put the logging on a ramdisk and shut off as much stuff as possible that might touch the disk.
fragmede•2mo ago
I think they’re complicated in different ways. A hard desk drive has to have an electromagnet powered up in a motor that arm that moves and reads the magnetic balance of the part of the drive under the read head and correlate that to something? Oh, and there are multiple read heads. Seems ridiculously complex!
jandrese•2mo ago
pkaye•2mo ago
But then as the years progressed, the transistors were made smaller and MLC and TLC were introduced all to increase capacity but it made the NAND worse in every other way like endurance, retention, write/erase performance, read disturb. It also makes the algorithms and error recovery process more complicated.
Another difficult thing is recovering the FTL mapping tables from a sudden power loss. Having those power loss protection capacitors makes it so much more robust in every way. I wish more consumer drives included them. It probably just adds $2-3 to the product cost.
namibj•2mo ago
dale_glass•2mo ago
What's that supposed to do for a SSD?
It was a trick for hard disks because on ancient drives the heads could get stuck to the platter, and that might help sometimes. But even for HDDs that's dubiously useful these days.
ahartmetz•2mo ago
butvacuum•2mo ago
Far more often it's the act of simply letting a device sit unpowered itself that 'fixes' the issue. Speculation on what changed invariably goes on indefinitely
rcxdude•2mo ago
ssl-3•2mo ago
Stuck heads were/are part of the freezing trick.
Another other part of that trick has to do with printed circuit boards and their myriad of connections -- you know, the stuff that both HDDs and SSDs have in common.
Freezing them makes things on the PCB contract, sometimes at different rates, and sometimes that change makes things better-enough, long-enough to retrieve the data.
I've recovered data from a few (non-ancient) hard drives that weren't stuck at all by freezing them. Previous to being frozen, they'd spin up fine at room temperature and sometimes would even work well-enough to get some data off of them (while logging a ton of errors). After being frozen, they became much more complacent.
A couple of them would die again after warming back up, and only really behaved while they were continuously frozen. But that was easy enough, too: Just run the USB cable from the adapter through the door seal on the freezer and plug it into a laptop.
This would work about the same for an SSD, in that: If it helps, then it is helpful.