Secrets of professionals: how to scale data centers of cloud providers

You have probably seen many times the epic photos of data centers, in which the horizon go rack clogged with servers, installed powerful generators and air conditioning systems. Many people like to brag about such personnel, but I have always wondered how data center operators choose and configure their equipment? Why, for example, do they install server “A” instead of “B”, what do they rely on - VM speed or the potential number of them in the rack, and how do they design computing power, especially in today’s world of hybrid clouds and artificial intelligence?

What is put at the head of the corner: speed new path or a number of VMS on the counter?

Speed VM without any specifics is an abstract concept that can not be evaluated without entering the criteria by which it will be evaluated. For example, typical evaluation criteria are the response time of an application that is running on a VM or the response time of a disk subsystem that hosts virtual machine data. If the customer wants to deploy all of us favorite SAP on a virtual machine, he needs gigahertz on the principle of"the more, the better." For such tasks, clusters with high-frequency processors with a small number of cores are allocated, since multithreading and performance of memory and disk subsystem are not as important for SAP as for a loaded DBMS.

The “right” service provider allocates different hardware platforms and disk resources into separate clusters and pools and places “typical” customer virtual machines in the right pools.

It is impossible to predict on what quantity of virtual machines SAP will work and where disk resources will be intensively used and it is necessary to plan Flash storage, in advance. This comes with time, with the receipt of certain statistics during the operation of the cloud, so providers working in the market for a long time are valued higher than startups. Each provider has its own platform for VM deployment, its own approach to expansion planning and procurement.

I see a competent approach to sizing based on the definition of “cube” for the computing node (compute node) and data storage (storage node). A hardware platform with a certain number of CPU cores and a specified amount of memory is selected as the computing cubes. Based on the features of the virtualization platform, estimates are made of how many “standard” VMS can be run on one “cube”. From these" cubes " are recruited cluster/pools where VM can be placed with certain performance requirements.

A “standard” virtual machine is a VM of a certain configuration, for example, 2 vCPU + 4 GB RAM, 4 vCPU + 32 Gb RAM, and also takes into account the reserve of RAM resources on the hypervisor, for example, 25% and the ratio of the number of vCPU to the total number of CPU cores in the cube (CPU over-provisioning). Once the reserve boundary is reached, equipment purchase planning begins within the pool.

Question of storage

As for data storage, the cube “storage node” is selected based on the indicators of cost, capacity and IOPS per Gigabyte - everything is standard. The optimal choice in the case of a storage node is the configuration when the use of the maximum possible capacity does not lead to the consumption of processor resources by more than 80-85%. In other words, when with the maximum possible use of disk volumes of the node, we can get from it the design capacity by the number of IOPS with the specified response time (latency), prescribed in the SLA.

Also at selection of storage a number of parameters (the access Protocol and a data transmission network, a binding necessary for switching of computing nodes and storage, energy efficiency and so on is considered, but I allocate as the main parameter - efficiency of loading of resources.

In the case of the storage node, we use any load generator, we can also use the benchmark from the manufacturer of virtualization platforms. Calculate the rate of utilization of storage, the ratio of IOPS to the volume, we understand that this configuration is suitable for us and begin to replicate it. In the case of computing nodes, we use empirical methods, sizing testing techniques and utilities from virtualization platform manufacturers ( or plus own experience.

Network infrastructure

It all depends on the virtualization platform. In the case of vmware, just two or three hardware servers are enough to deploy a full SDN, in the case of Openstack, it may be necessary to take out the functionality of SDN to dedicated hardware servers to achieve the required performance. The choice of hardware switches is based on the price-per-port ratio. There are a number of good decisions which except actually ports represent SDN functionality for virtualization platforms (Cisco ACI, Arista, Mellanox).

The number of switches (and thus ports) is determined by the choice of hardware platform. If these are rack servers, it is optimal to place TOR switches (switches installed at the top of the server rack, approx.ed) in each rack and put them in the core of the network, and if you choose a high-density converged platform (for example, Cisco UCS), there is no need for TOR switches.

From all the above the answer to the question "What do you put at the head of the corner: the speed of a single new path or the number of virtual machines per rack?"obviously, the hardware part of the cloud should be designed so that in one “cube” fit the maximum possible number of virtual machines, which can be placed without compromising their performance.

Saving on licenses

Any large service provider that uses proprietary SOFTWARE, applies a licensing scheme in the form of monetary contributions for certain resources, while savings are achieved by using more of these resources (points).

The problem of energy consumption

As for energy consumption-here in my opinion the problem lies in a slightly different plane. All commercial data Centers rent a “standard” rack with a supply of 5-6 kW of electric power to it. In some data centers, this is not a strict restriction, and the customer can consume more power, but in any case there are restrictions on exceeding, for example, no more than 3 kW. In some data centers, this is a strict limitation, so to achieve 100% rack occupancy is likely not to work. It would seem possible to take energy-efficient equipment, but, for example, high-frequency processors and multi-socket hardware servers consume a lot of power, so you have to compromise between the power of the “computing node” and the possibility of placing such “cubes” in one rack. 100% density of equipment placement in the rack at its 75-80% load capacity in practice, unfortunately, is not feasible.

SLA (responsibility to the customer)

The SLA should specify at least the availability of the service (control interface, network / Internet access, availability of disk resources). The values of these parameters on average do not exceed 99.9% for basic components (storage nodes, local network, Internet access channel), which allows downtime no more than 43 minutes per month.

Of course, more stringent SLA agreements can be concluded: detailed characteristics of a separate component of the service can be prescribed separately, for example, the performance of the disk subsystem and its response time, Internet bandwidth, channel reservation, etc.such extended conditions usually require a dedicated infrastructure, but these are special cases.

When changing the platform (for example, when choosing new processors), the necessary performance measurements are made depending on the frequency and number of CPU cores, after which the specification for a new computing node is drawn up, and new pools are already assembled from new “cubes”.

The basic principle of sizing is that all servers within the resource pool-the same capacity, respectively, and their cost is the same. Virtual machines do not migrate out of the cluster, at least when they are in normal automatic operation.

Data center prospects in the next 3-5 years?

Will AMD and ARM push Intel’s positions?

The x86_64 processor architecture will remain unchanged for the next 3-5 years and will dominate the cloud computing market. ARM in their modern architecture is definitely not about clouds, but rather it is a niche of IoT, integrated solutions, etc. Despite this, for example, vmware supports ARM architecture for its hypervisor - it is a niche solution that can be used to deploy containers based on VMWare products. In my opinion, Intel will continue to occupy the lion’s share of the market, processors. From my point of view, AMD is unlikely to be able to displace Intel in the processor market, and certainly this will not happen at the expense of the security features that the EPYC platform provides. Personally, AMD has always been nice to me (home platform is still running on a 4-core Phenom :)), but most likely as in the middle of zero “red” corny not enough resources to deal with Intel-th.

Will there be an increase in network infrastructure speeds?

Speeds of 10/40 GBps is the current reality, within 3-5 years within internal cloud networks, there will be a transition to 25/100 GBps, it is difficult to imagine the need for higher speeds in internal cloud networks. Maximum-is the core of a large network, within the framework of HCI platforms are already recommended internal interconnects between nodes at speeds of 100 GBps to achieve maximum performance during I/o operations. Actively continue to develop cloud-based network services, microsegmentation, CDN, and so on .p. In my opinion, the number of proprietary network hardware and network operating systems should be reduced. For greater flexibility, open platforms based on Open Ethernet standards will be used (, although here again there will be a question of network security…

How will storage develop?

In the field of storage systems, everything is moving towards the final transition to Flash. The basic Protocol for providing block devices to the operating system within 5-10 years will be NVMe, in this regard, the transition to data transfer protocols NVMe-of-FC and more promising in my opinion NVMe-of-Eth will begin. From the point of view of refusal of expensive “traditional” storage from vendors within clouds I see perspective transition to the distributed file systems (for example analogs of OneFS). From the point of view of architecture and sales model of “traditional” storage I like solutions from InfiniDat and Netapp, but it is a matter of taste.

Trends in Cloud

If we take the General trend-the future belongs to those providers who control not only the hardware component of the cloud, but also its software platform (hypervisors, SDN, data storage). This is a model of hyperscalers (Amazon, Google, Alibaba). The General cloud trend is “VM as code”, “network as code”, “storage as code”. Public clouds built on proprietary solutions are quite expensive, given the need for royalties to Software manufacturers, but practice shows that they are in demand (VMware vSphere as a service is successfully sold by Amazon). Almost all Russian service providers use VMWare products as a virtualization platform and to provide self-services.

Security trends

It seems to me that the requirements of GDRP will be implemented within the dedicated cloud segments. Within these segments, there is a high probability that will be implemented, for example, hardware encryption technology memory and processor caches. An attempt to extend such technologies to the entire public cloud can lead to an unjustified complication of its service, loss of flexibility in terms of self-services and the introduction of a number of restrictions, which in turn will lead to an increase in the cost of the provider’s services. If the technology of hardware encryption of data in memory quickly reach maturity (there will be support from both hardware manufacturers and major operating systems), I do not see any problems to integrate them into the public cloud, but again, everything will rest on the cost of such integration. At the moment, 90% of cloud users simply do not need such technologies.