The Reality of SSD Capacity: No-One Wants Over 16TB Per Drive

by Ian Cutress on March 13, 2019 11:00 AM EST

86 Comments | Add A Comment

86 Comments

One of the expanding elements of the storage business is that the capacity per drive has been ever increasing. Spinning hard-disk drives are approaching 20 TB soon, while solid state storage can vary from 4TB to 16TB or even more, if you’re willing to entertain an exotic implementation. Today at the Data Centre World conference in London, I was quite surprised to hear that due to managed risk, we’re unlikely to see much demand for drives over 16TB.

Speaking with a few individuals at the show about expanding capacities, storage customers that need high density are starting to discuss maximum drive size requirements based on their implementation needs. One message starting to come through is that storage deployments are looking at managing risk with drive size – sure, a large capacity drive allows for high-density, but in a drive failure of a large drive means a lot of data is going to be lost.

If we consider how data is used in the datacentre, there are several levels regarding how often the data is used. Long-term storage, known as cold storage, is accessed very infrequently and occupied with mechanical hard-drives with long-time data retention. A large drive failure at this level might lose substantial archival data, or require long build times. More regularly accessed storage, or nearline storage / warm storage, is accessed frequently but is often used as a localised cache from the long-term storage. For this case, imagine Netflix storing a good amount of its back-catalogue for users to access – a loss of a drive here requires accessing colder storage, and the rebuild times come in to play. For hot storage, the storage that has constant read/write access, we’re often dealing with DRAM or large database work with many operations per second. This is where a drive failure and rebuild can result in critical issues with server uptime and availability.

Ultimately the size of the drive and the failure rate leads to element of risks and downtime, and aside from engineering more reliant drives, the other variable for risk management is drive size. 16TB, based on the conversations I’ve had today, seems to be that inflection point; no-one wants to lose 16TB of data in one go, regardless of how often it is accessed, or how well a storage array has additional failover metrics.

I was told that sure, drives above 16TB do exist in the market, however aside from niche applications (such as risk is an acceptable factor for higher density), volumes are low. This inflection point, one would imagine, is subject to change based on how the nature of data and data analytics will change over time. Samsung’s PM983 NF1 drive tops out at 16 TB, and while Intel has shown samples of 8 TB units of its long ruler E1.L form factor, it has listed future drives using QLC up to 32TB. Of course, 16 TB per drive puts no limits on the number of drives per system – we have seen 1U units with 36 of these drives in the past, and Intel has been promoting up to 1 PB in a 1U form factor. It is worth noting that the market for 8 TB SATA SSDs is relatively small - no-one wants to rebuild that large a drive at 500 MB/s, which would take a minimum of 4.44 hours, bringing server uptime down to 99.95% rather than the 99.999% metric (5m22 per year).

86 Comments

View All Comments

Null666666 - Thursday, March 14, 2019 - link
First was 20m, but they made 40m at the time.
piroroadkill - Thursday, March 14, 2019 - link
The problem is rebuild times, plain and simple. If capacity rises but transfer speeds don't increase accordingly, your rebuild times grow and grow. Longer rebuild times equals unacceptable risk.
Null666666 - Thursday, March 14, 2019 - link
.m2 baby!

PCI 4 is due out soon.
lmcd - Friday, March 15, 2019 - link
Controllers also need more channels + more internal bandwidth.
goatfajitas - Wednesday, March 13, 2019 - link
Data centers and even the smallest of companies servers use RAID5 or higher on any volume that is storing data. This means a set of data exists even if any drive fails. As long as the drives are not more prone to failure than any other older drive, there is no reason to not want larger drives.
Death666Angel - Wednesday, March 13, 2019 - link
I don't remember who posted the article (it wasn't on a site I usually frequent), but he argued that RAID5 is basically dead in any larger scale deployment, even in small server scale applications. This is due to HDD capacities being so large and recalculating the data from the parity bits is so slow. It argued (with some math I don't remember, not even mechanical failure, but just bit errors on a 10^12bit scale or somesuch) that the chances of error in the array while restoring the array from one failed drive are so large after a certain capacity, that it is not a good idea to have a RAID5 array after a certain point of size, which we either are already at or approaching very soon (this article is a few years old I think and it spoke of it happening in the near future).
blakeatwork - Wednesday, March 13, 2019 - link
There was an old article from ZDNet in 2007, that proclaimed the death of RAID-5 on drives greater than 1TB (https://www.zdnet.com/article/why-raid-5-stops-wor... they had a follow-up in 2016 that said they weren't wrong, but that manufacturers upped their URE specs on *some* drives (https://www.zdnet.com/article/why-raid-5-still-wor...
lightningz71 - Wednesday, March 13, 2019 - link
Which is why RAID 6 exists. Two parity drives reduces the chance of data loss to below RAID-5 levels for any but the most extreme of array sizes. It is, of course, less power, cost, and space efficient as RAID-5, but, you have to pay for redundancy somewhere.
ken.c - Wednesday, March 13, 2019 - link
That's why real big storage uses some other kind of algorithm for data protection, be that 3x mirroring (see HDFS and other "cloud scale" storage) or reed-solomon forward error correction (Isilon or Qumulo) or some variant thereof. No one uses RAID5 or 6 alone at any sort of performance scale. Even when they are used, they're striped across, for example in a ZFS pool or Lustre or GPFS setup.

I see very little risk in using larger SSDs in that sort of configuration.

The author's contention that Netflix would have to go back to cold storage if they lose a single SSD is ludicrous.
erple2 - Wednesday, March 20, 2019 - link
First of all Netflix can't lose any single SSD - they don't own any of their storage. They buy all of it from AWS. Netflix is completely run in AWS land these days. As to how AWS manages that, I suspect "poop-tons of redundancy" is the way they do it. Well, except for those S3 buckets in us-east, that is...

The Reality of SSD Capacity: No-One Wants Over 16TB Per Drive

Related Reading

Post Your Comment

86 Comments

View All Comments

Null666666 - Thursday, March 14, 2019 - link

piroroadkill - Thursday, March 14, 2019 - link

Null666666 - Thursday, March 14, 2019 - link

lmcd - Friday, March 15, 2019 - link

goatfajitas - Wednesday, March 13, 2019 - link

Death666Angel - Wednesday, March 13, 2019 - link

blakeatwork - Wednesday, March 13, 2019 - link

lightningz71 - Wednesday, March 13, 2019 - link

ken.c - Wednesday, March 13, 2019 - link

erple2 - Wednesday, March 20, 2019 - link

Log in

Don't have an account? Sign up now