Manufacturers have been playing this game with DWPD/TBW numbers too --- by reducing the retention spec, they can advertise a drive as having a higher endurance with the exact same flash. But if you compare the numbers over the years, it's clear that NAND flash has gotten significantly worse; the only thing that has gone up, multiplicatively, is capacity, while endurance and rentention have both gone down by a few orders of magnitude.
For a long time, 10 years after 100K cycles was the gold standard of SLC flash.
Now we are down to several months after less than 1K cycles for QLC.
It turns out that a few extra bytes can turn a 1 year endurance into a 100 year endurance.
There are potentially ways a filesystem could use heirarchical ECC to just store a small percentage extra, but it would be far from theoretically optimal and rely on the fact just a few logical blocks of the drive become unreadable, and those logical blocks aren't correlated in write time (which I imagine isn't true for most ssd firmware).
ZFS has "copies=2", but iirc there are no filesystems with support for single disk erasure codes, which is a huge shame because these can be several orders of magnitude more robust compared to a simple copy for the same space.
Those QLC NAND chips? Pretty much all of them have an "SLC mode", which treats each cell as 1 bit, and increases both write speeds and reliability massively. But who wants to have 4 times less capacity for the same price?
Plenty of people would be willing to pay for SLC mode. There is an unofficial firmware hack that enables it: https://news.ycombinator.com/item?id=40405578
1TB QLC SSDs are <$100 now. If the industry was sane, we would have 1TB SLC SSDs for less than $400, or 256GB ones for <$100, and in fact SLC requires less ECC and can function with simpler (cheaper, less buggy, faster) firmware and controllers.
But why won't the manufacturers let you choose? The real answer is clearly planned obsolescence.
I have an old SLC USB drive which is only 512MB, but it's nearly 20 years old and some of the very first files I wrote to it are still intact (I last checked several months ago, and don't expect it's changed since then.) It has probably had a few hundred full-drive-writes over the years --- well worn-out by modern QLC/TLC standards, but barely-broken-in for SLC.
Very few people have the technical understanding required to make such a choice. And of those, fewer people still would actually pick SLC over QLC.
At the same time: a lot of people would, if facing a choice between a $50 1TB SSD and a $40 1TB SSD, pick the latter. So there's a big incentive to optimize on cost, and not a lot of incentive to optimize on anything else.
This "SLC only" mode exists in the firmware for the sake of a few very specific customers with very specific needs - the few B2B customers that are actually willing to pay that fee. And they don't get the $50 1TB SSD with a settings bit flipped - they pay a lot more, and with that, they get better QC, a better grade of NAND flash chips, extended thermal envelopes, performance guarantees, etc.
Most drives out there just use this "SLC" mode for caches, "hot spot" data and internal needs.
Though what make me wonder is that some reviews of modern SSDs certainly mention that that pSCL is somewhat less than 25% of capacity, like 400GB pSLC cache for 2TB SSD:
https://www.tomshardware.com/pc-components/ssds/crucial-p310...
So you get more like 20% of SLC capacity at least on some SSDs
Re-freezing is also critical, the container should contain no humid air when it goes into the freezer, because the water will condense and freeze as the container cools. A tightly wrapped bag, desiccant and/or purging the container with dry gas would prevent that.
I suspect that 2035 years time, hardware from 2010 will work, while that from 2020 will be less reliable.
The only MLC I use today are Samsungs best industrial drives and they work sort of... but no promises. And SanDisc SD cards that if you buy the cheapest ones last a surprising amount of time. 32GB lasted 11-12 years for me. Now I mostly install 500GB-1TB ones (recently = only been running for 2-3 years) after installing some 200-400GB ones that work still after 7 years.
CRTs from 1994 and 2002 still going strong. LCD tvs from 2012 and 2022 just went kaput for no reason.
Old hardware rocks.
Most likely bad capacitors. The https://en.wikipedia.org/wiki/Capacitor_plague may have passed, but electrolytic capacitors are still the major life-limiting component in electronics.
They still degrade with time, but in a very predictable way.
That makes it possible to build a version of your design with all capacitors '50 year aged' and check it still works.
Sadly no engineering firm I know does this, despite it being very cheap and easy to do.
For logic and DRAM the biggest factors are how far they're being pushed with voltage and heat, which is a thing that trends back and forth over the years. So I could see that go either way.
I much rather have 64GB of SLC at 100K WpB than 4TB of MLC at less than 10K WpB.
The spread functions that move bits around to even the writes or caches will also fail.
The best compromise is of course to use both kinds for different purposes: SLC for small main OS (that will inevitably have logs and other writes) and MLC for slowly changing large data like a user database or files.
The problem is now you cannot choose because the factories/machines that make SLC are all gone.
You can still get pure SLC flash in smaller sizes, or use TLC/QLC in SLC mode.
I much rather have 64GB of SLC at 100K WpB than 4TB of MLC at less than 10K WpB.
It's more like 1TB of SLC vs. 3TB of TLC or 4TB of QLC. All three take the same die area, but the SLC will last a few orders of magnitude longer.
So literally put your data in cold storage.
Like does a SSD do some sort of refresh on power-on, or every N hours, or you have to access the specific block, or...? What if you interrupt the process, eg, having a NVMe in an external case that you just plug once a month for a few minutes to just use it as a huge flash drive, is that a problem?
What about the unused space, is a 4 TB drive used to transport 1 GB of stuff going to suffer anything from the unused space decaying?
It's all very unclear about what all of this means in practice and how's an user supposed to manage it.
Generally, the data refresh will all happen in the background when the system is powered (depending on the power state). Performance is probably throttled during those operations, so you just see a slightly slower copy while this is happening behind the scenes.
The unused space decaying is probably not an issue, since the internal filesystem data is typically stored on a more robust area of media (an SLC location) which is less susceptible to data loss over time.
As far as how a user is supposed to manage it, maybe do an fsck every month or something? Using an SSD like that is probably ok most of the time, but might not be super great as a cold storage backup.
(As a note: I do have a 4TB USB SSD which did sit in a drawer without being touched for a couple of years. The data was all fine when I plugged it back in. Of course, this was a new drive with very low write cycles and stored climate controlled. Older worn out drive would probably have been an issue.) Just wondering how long I should keep it plugged in if I ever have a situation like that so I can "reset the fade clock" per se.
sudo pv -X /dev/sda
or even just: sudo cat /dev/sda >/dev/null
and it's pretty inefficient if the device doesn't actually have much data, because it also reads (and discards) empty space.for copy-on-write filesystems that store checksums along with the data, you can request proper integrity checks and also get the nicely formatted report about how well that went.
for btrfs:
sudo btrfs scrub start -B /
or zfs: sudo zpool scrub -a -w
for classic (non-copy-on-write) filesystems that mostly consist of empty space I sometimes do this: sudo tar -cf - / | cat >/dev/null
the `cat` and redirection to /dev/null is necessary because GNU tar contains an optimization that doesn't actually read anything when it detects /dev/null as the target.https://www.man7.org/linux/man-pages/man1/dd.1.html
I have no idea if forcing a read is good / the right way. I'm just answering how to do it.
How does the SSD know when to run the refresh job? AFAIK SSDs don't have an internal clock so it can't tell how long it's been powered off. Moreover does doing a read generate some sort of telemetry to the controller indicating how strong/weak the signal is, thereby informing whether it should refresh? Or does it blindly refresh on some sort of timer?
There are several layers of data integrity that are increasingly expensive to run. Once the drive tries to read something that requires recovery, it marks that block as requiring a refresh and rewrites it in the background.
samsung fix was aggressive scanning and rewriting in the background
The case an average user is worried about is where they have an external SSD that they back stuff up to on a relatively infrequent schedule. In that situation, the question is whether just plugging it and copying some stuff to it is enough to ensure that all the data on the drive is refreshed, or if there's some explicit kind of "maintenance" that needs to be done.
Isn't that what periodic "scrub" operations are on modern fs like ZFS/BTRFS/BCacheFS?
> the data refresh will all happen in the background when the system is powered
This confused me. If it happens in the background, what's the manual fsck supposed to be for?
Edit: found this below: "Powering the SSD on isn't enough. You need to read every bit occasionally in order to recharge the cell."
Hm, so does the firmware have a "read bits to refersh them" logic?
NAND flash is freakishly unreliable, and it's up to the controller to keep this fact concealed from the rest of the system.
dd if=/dev/sdX of=/dev/null bs=1M status=progress
work to refresh any bad blocks internally?Modern controllers have a good idea how healthy the flash is. They will move data around to compensate for weakness. They're doing far more to detect and correct errors than a file system ever will, at least at the single-device level.
It's hard to get away from the basic question, though -- when is the data going to go "poof!" and disappear?
That is when your restore system will be tested.
Maybe as a debug feature some registers can be set up adjust the threshold up and down and the same data reread many times to get an idea of how close certain bits are to flipping, but it certainly isn't normal practice for every read.
That depends on the SSD controller implementation, specifically whether it proactively moves stuff from the SLC cache to the TLC/QLC area. I expect most controllers to do this, given that if they don't, the drive will quickly lose performance as it fills up. There's basically no reason not proactively move stuff over.
The more interesting thing to note from those standards is that the required retention period differs between "Client" and "Enterprise" category.
Enterprise category only has power-off retention requirement of 3 months.
Client category has power-off retention requirement of 1 year.
Of course there are two sides to every story...
Enterprise category standard has a power-on active use of 24 hours/day, but Client category only intended for 8 hours/day.
As with many things in tech.... its up to the user to pick which side they compromise on.
[1]https://files.futurememorystorage.com/proceedings/2011/20110...
While if the endurance testing would exceed 1000 hours an extrapolated approach can be used to stress below the TBW but using accelerated techniques (including capping the max writable blocks to increase wear on the same areas).
Which is less dramatic than the retention values seem at first and than what gets communicated in articles I've seen. Even in the OP's linked article it takes a comment to also highlight this, while the article itself only cites its own articles that contain no outside links or citations.
[1] https://www.jedec.org/sites/default/files/Alvin_Cox%20%5BCom...
Specifically in JEDEC JESD218. (Write endurance in JESD219.)
The theory is that operating system files, which rarely change, are written and almost never re-written. So the charges begin to decay over time and while they might not be unreadable, reads for these blocks require additional error correction, which reduces performance.
There have been a significant number of (anecdotal) reports that a full rewrite of the drive, which does put wear on the cells, greatly increases the overall performance. I haven't personally experienced this yet, but I do think a "every other year" refresh of data on SSDs makes sense.
Eg. Data structures used for page mapping getting fragmented and therefore to access a single page written a long time ago requires checking hundreds of versions of mapping tables.
This article just seems to link to a series of other xda articles with no primary source. I wouldn't ever trust any single piece of hardware to store my data forever but this feels like clickbait- At one point they even state "...but you shouldn't really worry about it..."
Should I pop them in an old server? Is there an appliance that just supplies power? Is there a self-hosted thing I can monitor disks which I have 0 access usage for and don't want connected to anything but want to keep "live"
I have a HDD that was 17+ years since it last powered on. I dug it out recently to re-establish some memories, and discovered that it still reads. But of course you need to take care of them well, put them in an Anti-Static Bag or something similar, and make sure the storage environment is dry.
It's not prefect, but at least you don't have to struggle that much maintaining SSDs.
I use them for working with old unmounted hard drives or for cloning drives for family members before swapping them. But they would probably work for just supplying power too?
The one I use the most is an 18 year old Rosewill RCW-608.
I don't know if the firmware/controller would do what it needs to do with only power connected. I wonder if there's some way to use SMART value tracking to tell? Like if power on hours increments surely the controller was doing the things it needs to do?
My desktop computer is generally powered except when there is a power failure, but among the million+ files on its SSD there are certainly some that I do not read or write for years.
Does the SSD controller automatically look for used blocks that need to have their charge refreshed and do so, or do I need to periodically do something like "find / -type f -print0 | xargs -0 cat > /dev/null" to make sure every file gets read occasionally?
I wonder if there's some easy way to measure power consumed by a device - to detect whether it's doing housekeeping.
do I just plug it in and let the computer on for a few minutes? does it needs to stay on for hours?
do I need to run a special command or TRIM it?
The problem is the test will take years, be out of date by the time its released and new controllers will be out with potentially different needs/algorithms.
https://www.tomshardware.com/pc-components/storage/unpowered...
The data on this SSD, which hadn't been used or powered up for two years, was 100% good on initial inspection. All the data hashes verified, but it was noted that the verification time took a smidgen longer than two years previously. HD Sentinel tests also showed good, consistent performance for a SATA SSD.
Digging deeper, all isn't well, though. Firing up Crystal Disk Info, HTWingNut noted that this SSD had a Hardware ECC Recovered value of over 400. In other words, the disk's error correction had to step in to fix hundreds of data-based parity bits.
...
As the worn SSD's data was being verified, there were already signs of performance degradation. The hashing audit eventually revealed that four files were corrupt (hash not matching). Looking at the elapsed time, it was observed that this operation astonishingly took over 4x longer, up from 10 minutes and 3 seconds to 42 minutes and 43 seconds.
Further investigations in HD Sentinel showed that three out of 10,000 sectors were bad and performance was 'spiky.' Returning to Crystal Disk Info, things look even worse. HTWingNut notes that the uncorrectable sectors count went from 0 to 12 on this drive, and the hardware ECC recovered value went from 11,745 before to 201,273 after tests on the day.You just can't trust the hardware to know how to do this, need backup software with multiple backup locations, it will know how to recheck integrity
No idea if that's enough, but it seems like a reasonable place to start.
If not, that feels like a substantial hole in the market. Non-flash durable storage tend to be annoying or impractical for day to day use. I want to be able to find a 25 year old SD card hiding in some crevice and unearth an unintentional time capsule, much like how one can pick up 20+ year old MiniDiscs and be able to play the last thing their former owners recorded to them perfectly.
How often does it need to run? If it could be solar powered you could probably avoid a whole bunch of complexity per unit longevity.
I may not have noticed had fsck not alerted me something was wrong.
The difference between slc and mlc is just that mlc has four different program voltages instead of two, so reading back the data you have to distinguish between charge levels that are closer together. Same basic cell design. Honestly I can’t quite believe mlc works at all, let alone qlc. I do wonder why there’s no way to operate qlc as if it were mlc, other than the manufacturer not wanting to allow it.
You can run an error-correcting code on top of the regular blocks of memory, storing, for example (really an example; I don’t know how large the ‘blocks’ that you can erase are in flash memory), 4096 bits in every 8192 bits of memory, and recovering those 4096 bits from each block of 8192 bits that you read in the disk driver. I think that would be better than a simple “map low levels to 0, high levels to 1” scheme.
Loads of drives do this(or SLC) internally. Though it would be handy if a physical format could change the provisioning at the kernel accessible layer.
There is a way to turn QLC into SLC: https://news.ycombinator.com/item?id=40405578
Manufacturers often do sell such pMLC or pSLC (p = pseudo) cells as "high endurance" flash.
tlc/qlc works just fine, it's really difficult to consume the erase cycles unless you really are writing 24/7 to the disk at hundred of megabytes a second
Um. Backups seem like exactly why I might have data on an unpowered SSD.
I use HDDs right now because they're cheaper, but that might not be true some day. Also, I would expect someone less technically inclined than I am to just use whatever they have lying around, which may well be an SSD.
zfs in these filesystem-specific parity-raid implementations also auto-repairs corrupted data whenever read, and the scrub utility provides an additional tool for recognizing and correcting such issues proactively.
This applies to both HDDs and SSDs. So, a good option for just about any archival use case.
In a raidz1, you save one of the n drives' worth of space to store parity data. As long as you don't lose that same piece of data on more than one drive, you can reconstruct it when it's brought back online.
And, since the odds of losing the same piece of data on more than one drive is much lower than the odds of losing any piece of data at all, it's safer. Upping it to two drives worth of data, and you can even suffer a complete drive failure, in addition to sporadic data loss.
The odds of losing the same piece of data on multiple drives is much lower than losing any piece of data at all.
How many people have a device that they may only power up ever few years, like on vacation. In fact, I have a device that I've only used on rare occasions these days (an arcade machine) that now I suspect I'll have to reinstall since It's been 2 or 3 years since I've last used it.
This is a pretty big deal that they don't put on the box.
OptiNAND is a "SSHD" and thus has the same concerns with retention as an SSD. https://en.wikipedia.org/wiki/Hybrid_drive
the real issue here is QLC in which the flash cell's margins are being squeezed enthusiastically...
so it's as if the data... rusts, a little bit at a time
I'm unsure if dd if=/the/disk of=/dev/null does the read function.
dd if=$1 of=/dev/null iflag=direct bs=16M status="progress"
smartctl -a $1
If someone wants to properly study SSD data-retention they could encrypt the drive using plain dm-crypt and fill the encrypted volume with zeroes and check at some time point afterwards to see if there are any non-zero blocks. This is an accessible way (no programming involved) to let you write random data to the SSD and save it without actually saving the entire thing - just the key. It will also ensure maximum variance in charge levels of all the cells. Will also prevent the SSD from potentially playing tricks such as compression.Furthermore, replication isn't a backup.
Even if you are willing to spend that small fortune, good luck actually getting all the parts together without enterprise contracts.
[Edit: LOL, I see someone else posted literally the same example within the same minute. Funny coincidences.]
That said, they could also be storing relatively small amounts. For example, I back up to Backblaze B2, advertised at $6/TB/month, so ~300 MB at rest will be a "couple" bucks.
If I have enough data to need multiple SSDs (more than 8TB) then the cloud cost is not going to be substantially less. B2 is going to be above $500 a year.
I can manage to plug a backup SSD into a phone charger a couple times a year, or leave it plugged into one when it's not in my computer being updated. Even if I rate that handful of minutes of labor per year at a gratuitous $100, I'm still saving money well before the 18 month mark.
One concern I have is B2's downloading costs means verifying remote snapshots could get expensive. I suppose I could use `restic check --read-data-subset X` to do a random spot-check of smaller portions of the data, but I'm not sure how valuable that would be.
I like how it resembles LUKS encryption, where I can have one key for the automated backup process, and a separate memorize-only passphrase for if things go Very Very Wrong.
[0] https://restic.readthedocs.io/en/latest/080_examples.html#ba...
Long enough to experience data rot to a small degree but realistically what proportion of users have archived things away for 10+ years then audited the fidelity of their data on retrieval after fetching it from Glacier
I rotate religiously my offline SSDs and HDDs (I store backups on both SSDs and HDDs): something like four at home (offline onsite) and two (one SSD, one HDD) in a safe at the bank (offline offsite).
Every week or so I rsync (a bit more advanced than rsync in that I wrap rsync in a script that detects potential bitrot using a combination of an rsync "dry-run" and known good cryptographic checksums before doing the actual rsync [1]) to the offline disks at home and then every month or so I rotate by swapping the SSD and HDD at the bank with those at home.
Maybe I should add to the process, for SSDs, once every six months:
... $ dd if=/dev/sda | xxhsum
I could easily automate that in my backup'ing script by adding a file lastknowddtoxxhash.txt containing the date of the last full dd to xxhsum, verifying that, and then asking, if a SSD is detected (I take it on a HDD it doesn't matter), if a full read to hash should be done.Note that I'm already using random sampling on files containing checksums in their name, so I'm already verifying x% of the files anyway. So I'd probably be detecting a fading SSD quite easily.
Additionally I've also got a server with ZFS in mirroring so this, too, helps keep a good copy of the data.
FWIW I still have most of the personal files from my MS-DOS days so I must be doing something correctly when it comes to backing up data.
But yeah: adding a "dd to xxhsum" of the entire disks once every six months in my backup'ing script seems like a nice little addition. Heck, I may go hack that feature now.
[1] otherwise rsync shall happily trash good files with bitrotten ones
This is somewhat confused writing. Consumer SSDs usually do not have a data retention spec, even in this very detailed Micron datasheet you won't find it: https://advdownload.advantech.com/productfile/PIS/96FD25-S2T... Meanwhile the data retention spec for enterprise SSDs is at the end of their rated life, which is usually a DPWD/TBW intensity you won't reach in actual use anyway - that's where numbers like "3 months @ 50 °C" or whatever come from.
In practice, SSDs don't tend to loose data over realistic time frames. Don't hope for a "guaranteed by design" spec on that though, some pieces of silicon are more equal than others.
> Component Design Life 5 years
> TBW 14 PB for 7.68 TB drives
> Data Retention 3 months
And then 2.7.3 explains that this number applies for 40 °C ambient, not the operating/non-operating range (up to 85 °C).