How SSD drives that are greater than 10 TB will change enterprise-class data storage

Hard disks of 8 and 10 TB are already actively used in Enterprise systems where high density of data storage is important. So far, the production technology allows to produce such discs with a spindle speed of 7200 rpm, and it is obvious that the increase in performance by increasing the speed of rotation of the plates, while these discs are not threatened. Similarly, there is no reason to believe that hard drives with such a high density should be more reliable than their 2-terabyte counterparts. It would seem that in the case of storage systems, the more - the better, but it seems that just in the corporate world of hard drives come to the point where a large amount is no longer an advantage.

In early February, Intel promised to present a SSD disk of more than 10 TB within two years. And in early March, Samsung announced the start of deliveries of its first SSD drive PM1633a volume of 15.36 TB. It turns out that the last advantage of hard drives, their large volume, melts before our eyes like snow under the bright spring sun.

While there are no real cases of using large data arrays on SSD, we can reflect on why we should not now make a choice in favor of 8-10 TB of hard drives, and immediately target the SSD of the same size.

The main question - the cost of 1 GB

The only major advantage of using large hard drives in business is the cost of storing 1 GB of data. But, and this advantage comes to naught with the advent of large SSDS. You say, how is this possible, because 8-TB hard drive costs about $ 500, and 16 TB SSD is expected to cost about $ 8000? How can they be compared to the price of a Gigabyte? But, as they say, it’s all about the details.

If to compare in a forehead SSD and HDD at the price of gigabytes, SSD will lose. But if you look at the finished project implemented on SSD or HDD, it is possible that SSD will benefit from saving on caching devices. Let me explain.

For example, we are dealing with a new-fangled Big Data project that collects and processes petabytes of information. How will it be implemented in hardware? Approximately, it looks like this: here are two cabinets with disk shelves, clogged with 8 TB hard drives, where the data is stored. But there are 3-4 shelves with fast flash drives, where data is transferred for processing, and processing units are already connected to these shelves. This is not exactly caching in its usual sense. The choice of what to store on the HDD and what - on the SSD, makes the application itself or the operating system. It is somewhat similar to the SSD cache, which is now even in the entry-level NAS, only a little more complicated.

So, it turns out that you can not do without SSD, even if you store your data on large hard drives, unless, of course, you have an archive of online backups. It is still necessary to divide the data into “hot” and “cold”.

Cold Data

By choosing large SSD drives, you simplify the infrastructure: you do not need caching shelves because SSDS of this size, in terms of performance expressed in IOPS (I / o operations per second), are about 1000 times faster than server HDDs with a disk speed of 15,000 RPM. For comparison - HDD with spindle speed of 15 000 rpm, can provide performance in the area of 200-300 IOPS, depending on the load. And Samsung PM1633A gives 200 000 IOPS for reading and 32 000 IOPS for writing. Linear read and write speed is about 1100 MB/s, 5 times higher than 15K HDD. So now you don’t need to move data from one medium to another, you can connect compute nodes directly to the shelf with SSD. Today, there are already 2U servers with the ability to install 48 2.5-inch hard drives (Supermicro 2028R-E1CR48L). When using SSDS such as Samsung PM1633A, their disk space will be 737.28 TB. A 42-unit Cabinet loaded with such servers with SSD will give you 15.4 Pb of disk space.

As a result, instead of two cabinets with hard drives and 3-4 caching shelves, you get 1 Cabinet with disk shelves or servers that use SSD. And, of course, in this case, even with the same amount of stored data, SSDS will win at the price of a Gigabyte. And we are not talking about the fact that we do not need to rewrite the application to work with “hot” (on the caching device) and “cold” (on slow HDD) data.

RAID arrays can be used again

In the case of large HDD drives, RAID-arrays are contraindicated, because as you know, the trouble does not come alone, and if you have a broken hard drive in the RAID-array, then surely the second is on the way. It takes approximately 6-7 hours to recover RAID 5 on 5 1TB disks. The larger the disk you place, the longer the Rebuild operation will take. This operation may take a few days, and needless to say, that failed during the “rebild” second hard drive, will carry RAID 5, along with all the data in the abyss? Therefore, when it comes to big data and large disks, RAID arrays prefer simple duplication or distribution of data across different nodes, but again, at the application level. Data distribution technologies for physical nodes and hard disks work on the same principles as RAID, and resemble something between RAID 1 and RAID 5, but, as a rule, the efficiency of space use here is less.

So, large SSD drives will not have such a problem with array recovery. Their read and write speeds are still limited by the relatively “slow” SAS 12Gbps interface, which gives a little less than 1 Gigabyte per second. In real life, it is not clear how modern RAID controllers will show themselves at such speeds - will they have enough power of the built-in processor to reveal the advantages of SSD in RAID 5 arrays? But it is already clear that there will be no slowed down HDD speeds, so it is not necessary to city software distribution of data - it is possible to use the time-tested, reliable RAID 5, or RAID 6 for confidence. And in both cases, the efficiency will be higher than when trying to programmatically post data to different nodes.

A new era? She is!

I can compare the appearance of SSD drives with the volume of more than 10 TB With the release of iPhone or iPad. As these gadgets have changed the view of mobility, so large SSDS will change the view of data storage and processing. First, all data becomes “hot”. You can store fantastic volumes in one, fast device with a direct server host connection. Yes, even in the server! And this - terabyte database, which famously flies full-text search, a facial recognition system, looking for a person in the archives of hundreds of surveillance cameras for several months at once, it is, plus, the opportunity to study the application of Big Data in schools, and experiment, as they say, on live hardware.

Will 15K RPM HDD be able to well anything?

Definitely not. The era of hard drives has come to an end, and for some time, manufacturers will supply their hard drives as spare parts for installed storage and servers, as well as to react with horror to every reduction in the price of SSD. 3D Nand technology, thanks to which it became possible to create such a huge SSD, will be developed and cheaper. The need for drives with a speed of 10k and 15k rpm will disappear every day, so there is no need to wait for progress.

The only thing that still remains for hard drives is entry-level video surveillance systems, and NAS-s for small businesses. It does not require high speeds because of the limitations of the Ethernet interface, it needs a lot of volume for little money. Therefore, SATA drives at 7200 RPM in the foreseeable future will not disappear.