Huawei Ascend: review of unified platform for AI and ML

The field of artificial intelligence today looks so promising that every major chip manufacturer has already released its solutions, or will do so in the near future. Real players in the AI market, on whose solutions you can build infrastructure, just nothing: Nvidia with the CUDA platform, its eternal competitor AMD, and … actually Huawei. No, of course you can still talk about Google TPU, but it is not available for free sale.

As a rule, the introduction of artificial intelligence in an enterprise goes through two stages. The first is the process of machine learning, in which processors (GPU, TPU) process huge amounts of data, while forming some kind of model. As a rule, developers use cloud services or equipment installed in the company’s data Center for machine learning, because if you associate artificial intelligence with servers full of video cards, then you are thinking about the process of machine learning.

The second stage is to apply the trained model to a specific device, such as a surveillance camera or a server that processes data from a sensor farm. For these purposes, you need a processor with a different architecture than for machine learning, with lower power consumption and cost, because it is at this stage that AI goes to the masses.

Of course, the question of compatibility arises: will one software model designed for the GPU take into account all the optimizations of the processor on which it will be able to work in the future? Will you dynamically update it with new data, and will you have to convert it every time between the frameworks that are being developed? In General, while this industry is still booming, and everyone contributes something of their own, it is better to be safe.

Why Huawei?

First, Huawei is the only company that has single-brand solutions in the entire application cycle, from machine learning systems to peripheral servers and end devices such as surveillance cameras. From top to bottom, you have a single vendor, a single hardware architecture, and a single software framework.

Second, Huawei is always about diversifying American IT technologies, which is critical for state-owned companies. China does not impose sanctions, does not have a habit of prohibiting and restricting or interfering with other technologies.

Third, Huawei is traditionally cheaper than similar solutions from American brands (HP, Nvidia, Cisco, Dell), and usually by a significant 30% or higher.

Fourth, Huawei has long been at the forefront of technology, and their Ascend 910 platform for artificial intelligence systems is able to compete with those in the AI market for more than 10 years. Let’s look at the Ascend 910 processor family.

Huawei Ascend 910

Of course, with the imagination of the Chinese comrades is quite a problem: under the name Ascend, smartphones, smartphone processors and chips for artificial intelligence are produced, and the entire line: from small to large. We are naturally interested in the Ascend 910 itself - “what Nvidia is afraid of”.

First, Ascend 910 is a “chiplet” that uses both computing cores and the fastest memory HBM2 Aquabolt from Samsung on the same package. This is a very important step, because memory bus bandwidth is a key factor in the performance of machine learning systems. In the Ascend 910 chip, it is 1.2 TB/s. Computational crystals contain blocks for matrix multiplication and ARM cores, as well as blocks for FP16 and INT8 to speed up calculations that do not require high accuracy.


The computing capacity of the Ascend 910 is 256 TFLOPS FP16 and 512 TOPS INT8, which is enough to decode 128 streams of Full HD video. Since the triplet structure of the processor allows you to vary its power and performance already at the stage of processor production, for less resource - intensive tasks, you can use the same architecture in a more economical package-Ascende 310.

Ascend 310 Ascend 910
Codename Ascend Mini Ascend Max
Architecture DaVinci DaVinci
FP16 8 TeraFLOPS 256 TeraFLOPS
INT8 16 TeraOPS 512 TeraOPS
Number of decoding video channels H.264/H.265 1080p, 30FPS 16 128
Power Consumption, Wt 8 350
Lithography, nm 12 7

For example, if Ascend 910 is supposed to be used for machine learning, then the I output is an Ascend 310 processor with a power of only 8 W and a performance of 16 TOPS@ INT8 and 8 TOPS @ FP16.

Since AI systems can scale horizontally, Huawei has both “bricks” and large blocks of the Atlas series to build its AI infrastructure.

Huawei Atlas 800 Family

For large-scale implementations, Huawei offers Atlas 800 servers for installation in the company’s data center. Please note: the machine learning solution is delivered exclusively as a ready-made Atlas 800 server (Model 9000)using Huawei Ascend 910 neuro-Accelerators or the already high-performance Atlas 900 cluster.

The high power consumption of the Ascend 910, which is 310 W, limits the scope of application of these processors - only in the form of FPGA boards of the “mezzanine” type, on which massive radiators are installed. Ascend 910 processors are not yet available as expansion cards, so you can’t install them in a regular server or workstation. Huawei Atlas 800 Model 9000 machine learning servers are configured when ordered and can accommodate 8 Ascend 910 processors in a 4U enclosure. The platform uses 4 Huawei Kunpeng 920 processors with ARM64 architecture (discussed in detail in our review).

The Atlas 800 Model 9000 server has a performance of 2 pflops FP16, 8 100-Gigabit network interfaces, it can be produced with air or liquid cooling, and the efficiency is 2 PFLOPS/5.5 KW. This model is intended to be used in HPC environments to build AI models in the exploration, oil production, healthcare, and smart city environments.

Again, I will reproach our Chinese friends for the lack of elementary imagination, because they called two completely different servers for other needs… Huawei Atlas 800 Model 3000 and 3010. These two servers are intended for the next phase of using AI calculations of existing models in the areas of video surveillance analysis, text recognition, as well as in smart city systems. Physically, these machines are 2-processor servers with a height of 2U in which Atlas 300 expansion boards are installed that perform AI calculations.

Huawei Atlas 800 Model 3010 is built on the basis of two Xeon Scalable processors, and supports up to 7 Atlas 300 boards, and the model 3000 is already based on two Kunpeng 920, and the total number of installed AI boards here is at least one, but more - 8. Another argument in favor of the ARM64 architecture from Huawei. For integer INT8 operations, the performance of the Atlas 800 is 512 TOPS and 448 TOPS, which allows you to analyze 512 and 448 HD streams for the Model 3000 and Model 3010, respectively.

Huawei Atlas 500 Model 3000 is something unique for the modern world of artificial intelligence-a peripheral AI server. A device designed to be installed on a final site, whether it is a store, drilling rig, or factory floor, has a fanless design with passive cooling. At the same time, the server has an amazing temperature range: from -40 to +70 degrees Celsius, that is, no special heated cabinets are required for its installation.

The main purpose of Huawei Atlas 500 is to analyze streams from CCTV cameras. The server runs on an Ascend 310 processor that allows simultaneous decoding of 16 video streams in 1080p @ 30 FPS resolution.

As an Edge server should, there is high-speed Wi-Fi and LTE for Upstream streams and two Gigabit RJ45 for Downstream connections to the video surveillance network. In other words, you can install Huawei Atlas 500 on completely Autonomous objects and connect it directly to the speaker and microphone, for example, to play warnings or provide communication between staff and a virtual operator.

The server has a built-in 5 TB hard disk for storing some fragments of digital video archives. The typical power consumption of the Huawei Atlas 500 is between 25 and 40 watts, depending on the availability of the hard drive.

By default, the Huawei Atlas 500 is powered by 24 V DC, and even the power supply is not included. The server seems to have been specially created for installation in cabinets on telecommunications masts and supports, which are now used to house cellular repeaters and smart city components.

However, if your enterprise infrastructure is already based on solutions from another supplier, but you need Huawei’s capabilities as an AI platform, choose the Atlas 300 card. Each such Board has 4 Ascend 310 processors with a total performance of 64 TOPS INT8, which allows 64 streams to be decoded simultaneously. Yes, you are not mistaken, these PCIe boards are installed in the Atlas 800 (Model 3000) servers, which we discussed above, and what is interesting is the 1-slot adapters with passive cooling and without additional power. What does this mean in practice?

And the fact that Huawei Atlas 300 can be installed in almost any computer case, even in a NAS (if necessary) or 1U server: you only need a PCI Express 3.0 16x port. Again, the Ascend 310 chip is very undemanding for cooling, and the Atlas 300 Board can operate at ambient temperatures up to 55 degrees Celsius.

In total, the Atlas 300 Board has 32 GB of RAM, but memory is distributed across four processors, so you can allocate no more than 8 GB per task. But in the system, the controller is defined as 4 AI accelerators, which means that you can run 4 independent tasks on each map.

It is interesting that not only the existing server, but also any other “smart” device can be made even smarter by installing the Atlas 200 module with the Ascend 310 processor. This small box can be installed on surveillance cameras for facial recognition, on robots or drones. It has a serial interface at the output for connecting to contactors and a minimum power consumption of only 9.5 W. And at the same time, it allows you to decode the same 16 channels of H. 264 video with a resolution of 1080p.

For software developers, Huawei releases the Atlas 200 model with an Ascend 310 processor. This is a small box that consumes only 20 watts, and has a gentleman’s set of input/output ports (USB, RJ45, I/O). You can use it to test the solution for validating the algorithm on the site before purchasing and installing equipment, and in principle, you can “feel” the AI’s work on the same processor that will be installed in the production conditions. Well, since we have come to the software, we should talk about the platform for AI from Huawei.

Mindspore

Huawei did not limit itself to just releasing drivers for its Ascend series products, but this step is not for such a large giant. Instead, Huawei released its own AI platform, MindSpore, joining Google (Tensorflow) and Facebook (PyTorch). This is a free framework that you can use for programming in Python version 3.7.5, and just think about it-in fact, Huawei has its own “Tensorflow”. At the time of writing, it was available in two versions so far: 0.1.0-alpha and 0.2.0-alpha.

And the best part is that Huawei understands that in today’s open world, creating a community of developers requires support for all standards, so MindSpore supports both x86 CPUs, Nvidia GPUs with CUDA 9.2/10.1 libraries, and of course Ascend 910. You can install MindSpore via PiP or in the Anaconda virtual environment. Currently, packages are available for Ubuntu x86, Windows-x64 (CPU only), and EulerOS (aarch64 and x86).

Of course, it is clear that Huawei is now at the beginning of the development of Mindspore, and much will depend on the adoption of the framework by the community of programmers. However, from the point of view of prospects, the Chinese plans are downright Napoleonic:

Mindspore development directions:

  • Support for a greater variety of models (classical models, generative-adversarial Gan network, recurrent neural network, transformers, reinforcement learning, probabilistic programming, AutoML, and others)
  • Extending the API and libraries to improve programming capabilities
  • Optimize the performance of Huawei Ascend processors
  • Evolution of the software stack and performing optimization of the computational graph ( adding additional optimization features, etc.).
  • Support for more programming languages (not just Python)
  • Improve distributed learning by optimizing automatic planning, data distribution, and so on.
  • Improving the MindInsight tool to make it easier for the programmer
  • Improved functionality and security for Edge devices

Note the extension of programming languages: today it is only Python, but it is logical to expect support for R, LISP, Smalltalk, and Prolog.

EulerOS

If you have not heard what EulerOS is, then this is not surprising, because the announcement of this operating system took place very recently - in April 2020. Of course, Huawei needed its own Linux distribution, which can be offered as a platform for ARM64 architecture and applications for AI, and they took CentOS as a basis, solved Its main problem (delayed security updates, organizing a team of 24-hour support), received security certificates CC EAL4+ and CC EAL2+, and got a product that has the same architecture as Red Hat or Oracle Linux, which means it is easy to switch to it.

EulerOS

Huawei repositories contain a whole bunch of open source corporate software, including for network security, working with big data, building data storage networks, migration, cloud management, databases, etc.

EulerOS is available as a container image in the official docker repository and for download on the official website as an iso image.

MindSpore: looking into the future

Today, Huawei has everything to confidently gain a foothold in the artificial intelligence market: an open platform that can compete with tensorflow, a full cycle of hardware solutions from machine learning systems to data output. Plus, it has its own operating system and its own server platforms, with the maximum level of localization, where you can build monobrand solutions that are protected from political troubles.

Recommendations for ordering

Huawei is an example of a company that rises to the top in any business and competes with those who have set their standards “de facto”. Many of us should learn from their perseverance in expanding their presence in all technological areas of the modern IT world. Today, the company is starting to build its own chip manufacturing capacity to avoid dependence on Taiwan’s TSMC, which will make its products even more independent of US trade policy. At the same time, the company invests in the future generation of IT specialists, in its own software platforms, development environment and hardware. There is no other company on the market yet that offers such a synergistic platform, so I have no doubt that Huawei Mindspore will become the same standard as Tensorflow/CUDA. Yes, today it is also profitable to do projects on a platform that gives a price advantage when ordering a ready-made solution, but still before ordering, even if you plan to use AI solutions exclusively from Huawei, using Mindspore or another framework, I still recommend checking how well the code from the platforms is converted to your needs: Tensorflow and Caffe…