A new look at server immersion cooling: news from the OCP consortium

It’s still hard to get used to this spectacle: we take a powerful server and immerse it in a tank of liquid, then apply voltage and hope that no sparks fly. But maybe we should be afraid of something else: continue to build data centers with a service life of more than 10 years, the cooling of which depends solely on air flows? How sustainable is an industry that spends more on providing electricity to rotating fans than it does on powering processors?

“Immersion cooling is when the server is completely immersed in a dielectric liquid,” explained 3M engineer Jimil Shah at the Facebook Open Compute Project Virtual Summit 2020 conference in may. “With immersion cooling, we can significantly increase computational density compared to air cooling systems.”

Over the past few years, vendors have repeatedly pointed out that immersion cooling can both increase computing density, improve energy consumption, and extend the life of existing hardware (you could make your investments work several years longer if you put servers in a kind of stable stasis). But the argument for extending the service life is irrelevant. We do not equip existing servers with immersion cooling. But we know that if we go beyond the limits of acceptable heat generation, in the future it is impossible to design processors in this way. In other words, when we need to increase computing power, we have to think about the ultimate computational density.

“We have come to the conclusion that servers need to be reinvented,” said Mark Shaw, General Manager of advanced hardware at Microsoft. “Racks, power, servers, internal components — everything needs to be changed.”

Cold bath

The topic of liquid cooling was a red line throughout the OCP Summit conference, which lasted a day and a half. Three years after we schematically described the frightening picture of a server being submerged in a tank of mineral oil, immersion cooling has turned from a carnival show into a real science.

At the conference, we heard a number of good news, thanks to which we now have a more confident view of the future of submersible cooling. Here are the three news items:

• developed a diving chassis. Before that, we took and loaded existing servers designed for air cooling systems. But the air and the liquid is a different environment. The special standardized chassis for submersible servers being developed within the framework of OCP will have a length comparable to the length of a refrigerator door and a thickness comparable to the thickness of a chocolate bar, which, when fully equipped, will weigh no more than 34 kg;

• new dielectric fluids have been created. Production of new liquids for immersion cooling is ready. And this is not a mineral oil, but artificially created hydrocarbon and Fluorochemical compounds that have a lower boiling point compared to H2O. The emissions of the gas bubbles are essentially tiny tanks for the transport of heat. Such liquids allow you to create tanks with continuously boiling contents, effectively cooling the servers immersed in it. At the same time, these new liquids evaporate and condense so quickly that the server extracted from the tank is not only cold, but also dry;

• maintenance methods have been developed, and it is easier to maintain a stable environment. Air flow is a non — constant medium. A liquid tank, by contrast, provides a stable environment for servers, in which sensors more accurately determine physical characteristics, which in turn are much easier to stabilize. The robotic arm created by Asperitas, which is attached to a service cart, can remove a specific chassis from the tank and move it to a kind of desktop for maintenance.

New racks

Today, designers of submersible cooling systems are developing an industry standard based on Open Rack v3 for long and wide server chassis that will be 22 mm thick. To power servers in such chassis, the possibility of using power buses with a voltage of 54 V (like the one shown in the photo below) instead of power shelves (which OCP has advocated so far) is being considered. However, to date, these structures exist only on paper.

"Given the volume of the Open Rack v3, which is approximately 1.2 cubic meters, we can easily find enough liquid to cool equipment with a capacity of hundreds of kilowatts, — said Shaw. “The question is whether we can put the number of servers in the rack that such a liquid system can cool.”

Amsterdam-based Asperitas, for its part, has proposed using a 15 -, 19 -, and 21-inch-wide, 1-or 2-OU (“open unit”) submersible cooling chassis. This chassis is a vertically movable structure fixed in the body. Some engineers use the word "cassette"to refer to such chassis.

“The special feature of this chassis is that it is optimized for liquid washing,” explained Rolf Brink, founder of Asperitas. — Most server chassis are designed for air flow and do not allow the necessary amount of liquid to pass through the holes and gaps on them. In addition, the chassis is designed in such a way that it is easy to maintain. Finally, it supports a wide variety of hardware and allows you to create almost any configuration powered by the C13 connector or power bus."

“The location of components in a server with immersion cooling directly depends on the heat flow,” said Michael Helezen, a research and technical employee at the Strasbourg — based server manufacturer 2CRSI. Elezen and brink both proposed dividing the long and flat Open Rack chassis into three temperature zones.

One of the designs for a single-phase submersible cooling system was presented to OCP members by engineers from Asperitas, Intel, 3M, 2CRSI, and Flextronics. The design assumes the presence of three temperature zones — T0, T1 and T2. In the temperature zone T0, it is proposed to place the most heated components (graphics accelerators and power supplies). CPUs operating at slightly lower temperatures than graphics accelerators can be located in the middle T1 zone. Finally, components that feel great at ambient temperatures up to 18 C can be located in the T2 zone.

According to brink, a small part of the chassis (or cassette) that does not contain computing components may be located above the coolant. “Components with minimal requirements for ambient temperature, that is, having very high temperature resistance, can be placed here,” he said.

New liquids

Of course, the most exciting thing at the OCP conference was not the reports at all, but the observation of the boiling liquid that cooled the computer system. Unfortunately, since this was a virtual conference, we couldn’t see the gas bubbles clearly.

Now, forget for a second everything you know about cooling, and think about the fact that the amount of heat is constantly increasing. And since heat negatively affects the operation of servers, the entire design of the computer system should be such that all excess heat can easily leave it. Sometimes this can be done naturally, as in the case of a single-phase cooling system, where cooler and denser layers of liquid take the place of the heated layers rising up, creating a natural circulation. In other situations, as in the case of two-phase systems, the liquid boils. The new coolants developed by engineers have a lower boiling point compared to water. During the boiling process, the liquid turns into gas bubbles that transport heat from the server. The gas is directed through the tube to a cool surface, where it condenses and re-turns into a liquid. Then this liquid re-enters the tank. Engineers call this process “re-wetting”. “Re — wetting ensures that the liquid is in place to replace the one that has just changed its aggregate state, turning into gas bubbles,” brink explained.

Currently, there are two types of submersible coolants on the market:

• hydrocarbon — usually petroleum products that change their aggregate state when heated;

• Fluorochemical — artificially created liquids developed by combining hydrogen with fluorine; belong to the same class of liquids that are used to create oil - and moisture-repellent coatings, such as Scotchgard, as well as fire-fighting compounds, such as Novec (both examples from the 3M product line).

During the virtual conference, Asperitas presented a synthetic hydrocarbon liquid, the production of which was agreed with Shell (Yes, you understood correctly: the Shell oil and gas company was featured at the it event). The liquid, called S5X, will be produced exclusively for submersible cooling systems.

“The liquid is absolutely safe to work with and has no impurities,” brink said. — Its properties meet all the requirements of pharmaceutical organizations of the European Union and the United States for the purity of the product. In addition, the liquid has a very low volatility, so it is extremely stable. It does not contain Halogens, allergens and is suitable for use in the food industry. Similar liquids are used as base oil in medicinal products."

At the same time, the step from 3M warned that hydrocarbon liquids have a side effect — they moisten the components that are immersed in them, which can change the physical properties of the latter. As many engineers have pointed out, common mineral oils used in the early stages of studying immersion cooling systems led to the destruction of connectors and Board mounts, as well as to the decomposition of plasticizers and coatings on wires that give them flexibility.

“Electronic parts are made from a wide variety of hydrocarbon polymer materials,” Sha explained. " electronics and auxiliary equipment can contaminate the oil with various particles, which will lead to changes in its properties and efficiency." Particles in the liquid eventually clog filters and interfere with circulation, he added. This does not mean that the produced oils are unsuitable or unsafe for use in submersible cooling systems, but they simply require regular maintenance and filtration.

Mineral oils, Sha said, should be avoided. Only use synthetic fluids. Synthetic fluids can minimize most common faults associated with immersion cooling, he added.

Perhaps the most surprising thing is that these new fluids don’t look like regular oils and lubricants. Just a few seconds after the cassette is removed from the tank, those tiny drops of liquid that you expected to see on the components (remaining on your hands and clothes) instantly evaporate and condense again in the tank, leaving the chassis clean and almost dry.

New component Placement

removing the fan from the processor and dipping the bare chip into the coolant turned out to be a bad idea. For effective operation of the immersion cooling system in this case, you must first increase the surface area of the processor in contact with the liquid.

In a single-phase system, this can be done by attaching a copper radiator to the chip, the fins of which direct the flow of liquid in much the same way as the rudder on a water vessel does. Ideally, these edges should be located along the flow. Elesen of 2CRSI suggested that for the most efficient heat dissipation, the radiator fins on the chip should have a size calculated based on the viscosity and thermal properties of the coolant.

Power supplies and other components that are brick-shaped rather than plate-shaped should be placed in the lower area of the T0 chassis. To improve cooling efficiency, these components can be equipped with open radiators. Chassis with sufficient thickness must be equipped with connectors for vertical mounting of PCI Express cards, as well as the chassis itself. One of the problems that engineers are still working on, as they themselves reported, is finding the right slope for DIMM connectors, which are usually positioned at an angle. Ideally, one of the engineers said, the connector should allow the module to enter it almost parallel to the main Board, as in a laptop. But we still need to think about it.

The idea of filling the chassis with components as tightly as possible, with the condition that they do not create obstacles to the circulation of the liquid, may be a good idea. But, according to the engineers, in any case, it is necessary to maintain a sufficient distance between the components in the T2 zone located at the very top and the top point of the liquid, as well as between the components in the T1 zone located at the very bottom and the bottom of the tank.

Forced pause

Further, the success of immersion cooling systems will depend on demand. In the current environment, when data centers have a minimal number of employees, and many research centers are closed, demand will not come instantly. We are at a fork in the road that has been created by circumstances beyond our control. So today we can only wonder if the big players will listen to Yogi Berra’s advice and follow it.

If the economy returns to its previous growth rate in the near future, it is likely that the era of submersible cooling will begin — a technology that can cool growing capacity and do it more cost-effectively. Of course, the coolant tanks may not get into every data center while we’re alive. But at least for high-performance computing systems (HPC), they are about to become indispensable. Major players may well be among the first to arm themselves with this technology.