One of the expanding elements of the storage business is that the capacity per drive has been ever increasing. Spinning hard-disk drives are approaching 20 TB soon, while solid state storage can vary from 4TB to 16TB or even more, if you’re willing to entertain an exotic implementation. Today at the Data Centre World conference in London, I was quite surprised to hear that due to managed risk, we’re unlikely to see much demand for drives over 16TB.

Speaking with a few individuals at the show about expanding capacities, storage customers that need high density are starting to discuss maximum drive size requirements based on their implementation needs. One message starting to come through is that storage deployments are looking at managing risk with drive size – sure, a large capacity drive allows for high-density, but in a drive failure of a large drive means a lot of data is going to be lost.

If we consider how data is used in the datacentre, there are several levels regarding how often the data is used. Long-term storage, known as cold storage, is accessed very infrequently and occupied with mechanical hard-drives with long-time data retention. A large drive failure at this level might lose substantial archival data, or require long build times. More regularly accessed storage, or nearline storage / warm storage, is accessed frequently but is often used as a localised cache from the long-term storage. For this case, imagine Netflix storing a good amount of its back-catalogue for users to access – a loss of a drive here requires accessing colder storage, and the rebuild times come in to play. For hot storage, the storage that has constant read/write access, we’re often dealing with DRAM or large database work with many operations per second. This is where a drive failure and rebuild can result in critical issues with server uptime and availability.

Ultimately the size of the drive and the failure rate leads to element of risks and downtime, and aside from engineering more reliant drives, the other variable for risk management is drive size. 16TB, based on the conversations I’ve had today, seems to be that inflection point; no-one wants to lose 16TB of data in one go, regardless of how often it is accessed, or how well a storage array has additional failover metrics.

I was told that sure, drives above 16TB do exist in the market, however aside from niche applications (such as risk is an acceptable factor for higher density), volumes are low. This inflection point, one would imagine, is subject to change based on how the nature of data and data analytics will change over time. Samsung’s PM983 NF1 drive tops out at 16 TB, and while Intel has shown samples of 8 TB units of its long ruler E1.L form factor, it has listed future drives using QLC up to 32TB. Of course, 16 TB per drive puts no limits on the number of drives per system – we have seen 1U units with 36 of these drives in the past, and Intel has been promoting up to 1 PB in a 1U form factor. It is worth noting that the market for 8 TB SATA SSDs is relatively small - no-one wants to rebuild that large a drive at 500 MB/s, which would take a minimum of 4.44 hours, bringing server uptime down to 99.95% rather than the 99.999% metric (5m22 per year).

Related Reading

Comments Locked

86 Comments

View All Comments

  • Hgp123 - Thursday, March 14, 2019 - link

    Can someone explain why a failed drive would cause downtime? I understand that a failed drive needs to be rebuilt but doesn't a hotswap system prevent downtime? I don't understand why a system would ever need to go down to replace a drive when I've got a dinky HP server that allows me to swap out drives and rebuild while the OS is running.
  • PeachNCream - Thursday, March 14, 2019 - link

    It doesn't cause downtime. An array can be rebuilt while remaining in production. Of course, there will be a performance impact as the rebuild is happening. Part of the point of using a fault-tolerant storage array is to (buckle in for this because it's going to be absolutely shocking) continue operations in spite of a fault.
  • SzymonM - Thursday, March 14, 2019 - link

    Downtime for rebuild? C'mon you have RAID controllers with customizable rebuild priority. I'd love 16TB or 32TB SSD drives for my Gluster nodes, because larger drives == less nodes == lower cost of DC presence (rack space, cooling, power, cost of the rest of the server). BTW Gluster also has customizable resilvering policy for replicated volumes. The only problem is 15TB Samsung drives are pricey as hell.
  • abufrejoval - Thursday, March 14, 2019 - link

    Actually I don't mind more Gluster nodes as long as the fabric can manage the additional bandwidth.

    And with redundancy managed via Gluster I am considering to lower the redundancy within the boxes at least for SSD: Never liked the write amplification of hardware RAID controllers with their small buffers and HDD legacy brains.

    Still ZFS below Gluster for the "new tape" on HDDs.
  • kawmic - Thursday, March 14, 2019 - link

    4tb @ $100 sounds reasonable. I would pay that.
  • plsbugmenot - Thursday, March 14, 2019 - link

    I want one. Apparently that makes me a "no one". Thanks.
  • urbanman2004 - Thursday, March 14, 2019 - link

    I'm pretty satisfied w/ both my 8TB and 10TB high capacity drive, but does it scare anyone that helium is being to compensate for filling the void of higher capacity drives?
  • abufrejoval - Thursday, March 14, 2019 - link

    Yes. As I understand it you cannot stop Helium from leaking only compensate for expected life time. I tend to have higher expectations...
  • Null666666 - Thursday, March 14, 2019 - link

    Ah...

    I would love >128g... I work in rather large data, have for years.

    Mange risk by replication.

    Never had much trust in "...never need more then".
  • peevee - Thursday, March 14, 2019 - link

    PCIe5 will allow for 4x speed. Controllers will probably catch up eventually. Hence 4x capacity at the same rebuild time.

    Now... who the hell needs PBs of capacity in 1U but Google and Facebook? Lots and lots of unused data nobody is going to access - what kind of connection to that rack are you going to have?

Log in

Don't have an account? Sign up now