Threadripper vs EPYC: comparison of three 32-core AMD processors in server applications

After the recent decision by VMware to change the licensing policy of Its ESXi hypervisor by accepting a limit of 32 physical cores per 1 processor socket, the 32-core milestone has become the front line where AMD reigns Supreme, and if today you choose a 32-core platform for virtualization, containers, or applications, then you can simultaneously access 3 server architectures from AMD.

  • First of all, this is a good old-fashioned first-generation EPYC with the code name “Naples”, which today you can buy at a big discount.
  • The second generation of EPYC with the code name “Rome”, recently updated with models with a third-level cache increased to 256 MB.
  • The game “monster”, Ryzen Threadripper, the second generation, which in this article is given a special place.

As is often the case, AMD has a large selection of 32-core cores for different tasks. We took EPYC 7551p, EPYC 7532 and Threadripper 2990WX to test and compare them in popular server applications:

  • MySQL
  • REDIS
  • NGINX
  • TensorFlow
  • ElasticSearch 7.6.0

We will find out in which cases it makes sense to choose the good old first EPYC, in which-the newest EPYC Rome with an increased cache, and in which- to throw everything and put any gamer’s dream, Threadripper 2990WX, on the server.

Why do we give a special place to Threadripper?

Because recently Hetzner introduced its dedicated servers based on the Ryzen Threadripper processors. I don’t think this is the first use of gaming CPUs in Cloud hosting, but in my memory this is the first time that a world-famous company is proud to offer you a piece of a HEDT server for rent. And I would pass it by if we were talking about some local small hoster, but excuse me Hetzner is one of the ten largest cloud providers in the world, it has an excellent reputation.

And although this company has been caught many times using desktop components, Hetzner is a large thriving business, and if they make such a step, then we need to take an example from them and figure out whether there is an opportunity to save money here, and why buy Threadripper instead of EPYC, because these processors, although they have almost the same socket, but require different motherboards.

Server platform for Threadripper?

In the world there is only one exclusively server motherboard for ThreadRipper of the first two generations, produced by ASRock Rack. This manufacturer likes experiments, and makes for example, a Mini ITX Board for LGA3647, or server boards on Socket AM4. It is on ASRock Rack boards that Hetzner servers with Ryzen processors work. Unfortunately, the deal between ASRock Rack and Hetzner is under NDA, but adding up the bits of information that were collected, I assume that this particular motherboard was used with minimal changes typical for a large customer.

ASRock Rack X399D8A-2T is an ATX platform with two 10-Gigabit ports on the most modern controller for 10GBase-T, Intel X550-T2. This motherboard has IPMI monitoring based on the ASpeed AST2500 chip with a dedicated 1-Gigabit network port, 8 DIMM slots with ECC support, two M. 2 slots for NVME/SATA, 8x SATA, and 5 PCI-E 3.0 slots of x16 format with speeds (3 x16 + 2 x8). In many ways, the motherboard repeats the ASRockRack EPYCD8-2T for EPYC 7000, but this is understandable: there is the same location of memory slots, PCI and pins on the back panel.

In matters of cooling, too, somehow not thick: the lowest cooler under the Threadripper, which can be found in the free market - a Supermicro SNK-P0063AP4 or Dynatron A26 height of 2U, a little better and higher- Noctua NH-U9 TR4-SP3 under 4U. Given that the Threadripper 2990WX has a TDP of 250W, it makes sense to look for a liquid-cooled platform, and this is not an empty word: in order for the Threadripper 2990WX in our test stand not to overheat and reach its maximum frequencies, I had to not only use a liquid cooling system with a 360 mm radiator, but also place the test stand in a room with an air temperature of +10 degrees Celsius.

With RAM, the situation is as follows: Ryzen Threadripper processors only support unbuffered memory. They have a 4-channel DDR4 controller, and in the server motherboard ASRock Rack X399D8A-2T, its frequency is limited to 2666 MHz. Buffered RAM modules are not supported by these processors, and therefore large amounts of RAM in servers with Threadripper are not available to you. The list of compatible DIMM modules includes models with a capacity of 8 and 16 GB, and in total you can put up to 256 GB of RAM on the motherboard. This is very small for virtualization, but it is sufficient for containers or individual applications that require high CPU power. As confirmation, we can look at the servers that CloudFlare uses that have 256 GB of RAM.

In all other respects take Threadripper, write on the lid with a Sharpie "Iam EPYC!"and work with it as an EPYC, and we’ll figure out the differences now.

Difference between SoC and traditional CPU

If you close your eyes and take the Threadripper X2990 in your left hand and the EPYC in your right, you won’t feel any difference: these processors have almost the same socket, size and weight, with the only difference that EPYC is a full - fledged SoC that doesn’t require installing the South bridge on the motherboard, and Threadripper is a CPU that still needs a chipset. The southern bridge is responsible for the PCI Express bus, which is connected to some slots, for SATA ports and binding. Let’s compare the topology of a typical motherboard under EPYC with a Board under Threadripper:

On the ASRock Rack X399D8A-2T Board, the chipset is the same as on the gaming motherboards: AMD X399, which advanced gamers have already scrapped, and in the servers it, as they say, will still serve. In terms of binding functionality, comparing with the SAME EPYC8D-2T Board, there are no disadvantages: 2 SATA ports with DOM support for loading the hypervisor, plus 8 more SATA ports, of which 4 are output by a Mini-SAS connector for connecting to the trash or backplane, 2 m-Key ports, of which one can work as an OCULink, and even a USB port for a flash drive with VMware ESXi is soldered on the motherboard so that it does not interfere with video cards. Yes, here one of the M. 2 slots in SATA mode will use the line from the South bridge, but let’s be objective: there is no sense to occupy this slot with a SATA drive - NVME format 2280 is requested there, and in this case the drive exchanges data with the processor. By the way, you can combine two M. 2 drives in RAID 0/1 from the motherboard BIOS.

Naturally, ASRockRack X399D8A-2T has fewer PCI Express 16x slots than motherboards under EPYC, where all slots have a 16x bus width (PCI Express 3.0 for the first generation of EPYC and PCI Express 4.0 for the second). From a practical point of view, this means that the platform on Threadripper processors is clearly not designed for 3 or more GPUs of the Nvidia Tesla V100 type that use PCI-E 3.0 16x, but at the same time you can use both network cards and HBA, which require a PCI Express 8x bus width.

Memory slots are installed along the air flow for optimal purging in the rack housing. If there are not enough network ports in the rack, you can connect to the IPMI interface of the ASpeed AST2500 chip via any of the 10-Gigabit ports in Out-of-band mode. In General, this motherboard has all the same things that you are used to in a traditional server, plus 1 more USB type-C port, which does not happen on motherboards under EPYC, and to which you can connect as much as an entire disk shelf of the QNAP TR-004U type. The board also has a Realtek ALC892 audio codec, which is suitable for transmitting audio on an RDP server, but no more.

ThreadRipper and EPYC: differences in memory controllers

All three processors under consideration have large differences in working with RAM. The second-generation ThreadRipper processors use a 4 - channel controller, while the first-generation EPYC uses an 8-channel controller. In both cases, the memory controller is physically located on each of the 4 CCC (Core Complex) crystals with cores and cache memory. It turns out that every 8 cores in the first - generation EPYC have direct access to no more than two memory channels, and the task of the hypervisor is to correctly address the amount of memory allocated for the virtual machine, so that the virtual processors and virtual memory are served by a single processor chip.

Threadripper 2990wx processors are much worse: two chips do not have their own memory controllers at all and access it through neighboring ones, which leads to delays. Moreover, the diagram shows that if you plan to use all 32 processor cores, then you should install all 8 memory modules in the first-generation EPYC, and four will be enough in Threadripper 2990WX. Naturally, the larger your application exchanges data with RAM, the more noticeable the difference between desktop and server AMD processors will be, but in practice this is not necessary for everyone.

A radical change in the new EPYC Rome processors is that the manufacturer has removed the interprocessor switch, PCI Express bus, memory controllers, and all the rest of the binding from the crystals with cores to a separate Central chip, the so-called I/o block. Thus, each processor core has access to all 8 memory channels, which not only simplifies the work of the hypervisor, but also improves performance. Theoretically, this step alone should be enough for EPYC Rome to win in all tests against processors with the Zen 1 architecture and the same number of cores, but let’s not rush (spoiler - a lot depends on the architecture of the tested application and the amount of data).

What EPYC doesn’t have!

It is very interesting that the motherboard ASRockRack X399D8A-2T uses a traditional blue-gray text BIOS, which (Oh, I can’t contain my emotions), supports overclocking of the processor and memory, and also boasts profiles for saving overclocking settings. The existing watchdog timer will restart your server if it freezes during operation from overclocking. Of course, someone will say that overclocking in the server is not serious, but do not rush to extinguish the candles: there is a whole layer of servers in the world where the CPU frequency is of decisive importance when choosing. Their task is to work 8 hours a day, and then they usually overwork or rest. These are servers for HFT (High-Frequency Trading), whose task is to earn money on the exchange by high-frequency trading, that is, by placing orders during the time period when the exchange server requests a repeated network packet from the client who submitted the order to buy/sell. These machines are usually installed in the same data centers as the exchange servers, and their task from the point of view of hardware is to achieve a minimum delay in placing purchase/sale orders. In such servers, overclocking, liquid cooling, processors with frequencies of 5 GHz and higher, and even FPGA boards are actively used.

In addition, we have a number of non-critical tasks, such as rendering or calculating models for neural networks, where in the event of a failure, you can continue the task from the same step. And in General, let’s not forget that gamers ’ processors have been running in overclocked mode for years, without freezes, and some do not even imagine a computer without overclocking. For all these cases, the asrockrack X399D8A-2T motherboard is ready, but if you don’t want to - don’t overclock!

About EPYC, it should be said that if you have a new, second-generation processor, it differs from the first by the ability to configure NUMA memory configurations with binding to separate groups of cores or the entire socket at once. You may also have access to memory delay settings, but this depends on the will of the motherboard manufacturer, and is extremely rare.

What Threadripper doesn’t have

We remember that EPYC is the most secure processor, as it has been since the first generation on the Naples core, which has memory encryption options, and EPYC Rome added register encryption (!), significantly increasing the isolation of virtual machines. In General, these " chips “are almost a mini-revolution with a single, but significant drawback: the most” delicious " functions like AMD SEV work only under the Linux KVM hypervisor. In VMWare, their support is expected starting from version 7, and in Windows Server with its Hyper-V, it is generally unknown when to wait.

CPU_z_epyc CPU_z_epyc2 CPU_z_threadripper

From mundane things like SSE/AVX support, all three generations of processors are exactly the same.

EPYC 7551 TR 2990WX EPYC 7532
Year of arrival 2017 2018 2020
Number of cores 32 32 32
Number of threads 32 32 32
L3 cache, Mb 64 64 256
AMD SEV Yes No Yes
Memory Channels 8 4 8
Memory type ECC RDIMM UDIMM, ECC UDIMM ECC RDIMM
Max. mem. freq., MHz 2666 2666 3200
Frequency, GHz 2.0-3.0 3.0-4.2 2.4-3.3
TDP, Wt 180 250 200

However, if you are a solid company with a serious approach to security, it is better to choose the new EPYC, at least because of the presence of these additional features of memory and register encryption.

Support by VMware

Thread Reaper processors are not officially supported by the ESXi hypervisor, and it is unlikely that they will ever be, because vMware has not even certified the server EPYC 3000. This doesn’t mean that VSphere won’t run on Thread Reaper 2990x - it will. I tested it, and even migrating virtualok without stopping between ESXi hosts on EPYC and Thread reaper worked without problems. The ASRock X399D8A-2T platform supports SR-IOV and PCI-E Pass through functions, and you can, for example, give the SATA controller to the guest operating system for virtual storage.

Everything is fine, there is nothing to find fault with, TurboBoost works inside guest systems with default settings, but the very realization that confidence with an unsertified configuration is not guaranteed by anyone, is firmly embedded in the subconscious.

Frequencies

The first EPYC has a sad situation with frequencies: when it was developed, 180 W was considered too large for the server market, and due to engineering features, all the cores of the first EPYC could not be accelerated simultaneously in principle. The maximum frequency of 2998 MHz appears here only when loading 4 cores, and when loading 8 cores, it will remain achievable only for 2 of them, and the rest will work at 2543 MHz or lower if the active power saving mode is selected. With 16 active threads, the maximum turbo-boost frequency will only be available on 1 core, and with 32 loaded cores, they will all line up at 2543 MHz and there will be no acceleration. Increasing the heat package for these processors in the BIOS-e of our motherboard is not, and installing the SOHO will not help: it’s not the temperature, but the processor itself. It is now possible to put a 250-Watt processor in a 1U case, giving a third of the useful volume for the cooling system, and three years ago it was extremely unfashionable and even vulgar.

The beauty of the Threadripper 2990x is its frequency and absolute limitlessness in power consumption from the outlet: the base frequency value is 3000 MHz with an increase to 4200 MHz in turbo mode. It was made for computers that cost as much as a 3-year-old foreign car, and then the buyer HEDT will choose either a top-end 360-mm liquid cooling system with RGB lighting, or a massive Noctua air cooler, or collect a customized dropsy, adding another thousand $to the price tag. How you divert heat is not just your concern, it is part of your hobby, your lifestyle and your interests that you share on forums, and AMD only gives you a devourer of any software code so that you can use it correctly, even if you have to disable half of the cores in the BIOS to get to the maximum frequency. You may ask, why do this? The fact is that the top frequency of 4174 MHz can be reached when loading only 2 cores, already at 8 their frequency will be 3800 MHz. Loaded to a maximum of 16 cores operate at a frequency of 3600-3700 MHz each, and 32 cores together withstand the frequency of 3394 MHz. Some enthusiasts have found that disabling half of the cores in Threadripper 2990x helps It consistently maintain maximum frequencies and show better performance in games.

Maximum number of cores running simultaneously at the maximum frequency

EPYC 7551p Threadripper 2990WX EPYC 7532
4 2 32

Frequency of each core when loading in 32 threads, MHz

EPYC 7551p Threadripper 2990WX EPYC 7532
2543 3394 3296

The recently announced EPYC 7532 is an attempt to get maximum performance on 32 cores. In fact, it is a variant of the 64-core EPYC 76x2 with 32 cores blocked, whose third-level cache was distributed among the remaining cores in service and amounted to 256 MB per processor (8 MB per core). To pay for the increase, you have to inflate the power consumption: up to 200 watts. I want to immediately note that AMD has an EPYC 7542 processor, which also has 32 cores and similar frequencies: 2.35-3.35 GHz, but a strikingly smaller TDP - only 155 watts. This variation in power consumption is explained by the hardware configuration: 4 8-core chiplets in the EPYC 7542 and 8 4-core chiplets in the EPYC 7532. To pay for the increase, you have to inflate the power consumption: up to 200 watts. I want to immediately note that a step higher is the higher-frequency 7542 with a base frequency of 2.9 GHz, which consumes 225 Watts.

Below, AMD has an EPYC 7542 processor, which also has 32 cores and similar frequencies: 2.35-3.35 GHz, but a strikingly smaller TDP - only 155 watts. This variation in power consumption is explained by the hardware configuration: 4 8-core chiplets in the EPYC 7542 and 8 4-core chiplets in the EPYC 7532. Interestingly, in the EPYC lines, there are no restrictions on the number of threads, the use of AVX/AVX2 instructions: the frequencies of all 32 cores independently jump from 1496 to 3296 MHz, and this is the best turbo Boost operation plan that could be desired. In General, this is true for the entire series of modern EPYC Rome 7xx2-the entire turbo frequency scheme in these processors is redesigned, and the entire new line can keep frequencies above 3000 MHz for all cores.

NGINX Plus

NGINX functionality is not limited to the Front-End server for web applications. Using it as a reverse proxy for the same databases already creates a significant load on the host and requires high parallelism. This is the reason why we use commercial NGINX Plus instead of free NGINX (due to the ability to strongly parallelize the load on all cores). To load one of the fastest and lightest software products, the testbed must be several times more productive than the tested configuration, which is impossible in our editorial office. In their blog, the NGINX team published the results of their own tests with default settings, and repeating their methodology, I could not even get close to their values. After discussing the situation with my colleagues, I decided that our test configuration is sufficient to compare processors with each other, so we will use relative results, taking THE EPYC 7551p reading as 100%.

Hypervisor VMware ESXi 6.7U3
Client VM, 32 vCPU, 12 Gb Ubuntu Linux 18.04 LTS
Server VM, 16 vCPU, 16 Gb Ubuntu Linux 18.04 LTS Nginx Plus R20 (1.17.6)

I run the test with this command:

wrk -t 256 -c 1000 -d 120s http://server-ip/0kb.bin

Nginx_en

In terms of minimum latency, the EPYC 7532 shows an almost two-fold advantage over previous-generation processors, and this is very important if you need a high application response time. And again, returning to AMD’s large corporate clients, I want to note that CloudFlare chose the record-fast EPYC Rome for itself, not Threadripper. I don’t like to talk about hardware that didn’t go through our testing, but if low latency is related to the Zen2 architecture, then high-frequency 8-core EPYC 7252/7232P/7262 might be the best budget solution for WAF / Proxy / UTM applications precisely because of this metric.

Redis

In previous tests, I found that the fastest versions of Redis and MariaDB are delivered in Oracle Linux repositories, so we will use this OS for databases.

Test configuration
Hypervisor VMware ESXi 6.7.3
Test virtual machine VM, 32 vCPU, 16 Gb Oracle Linux Server 7.6 Redis server 3.2.12 MariaDB 5.5.64

For many years, non-relational databases have been optimized for the speed of execution of simple queries, and it is not surprising that the processor frequency plays a crucial role here. In single-threaded mode, we see the same thing as in the NGINX test: the minimum delay leads the second EPYC to victory.

Redis_en_1

According to previous tests, it is clear that for simple applications running in 1-2-several threads, the best solution is EPYC Rome, and I really want to believe that the situation persists with a smaller number of cores.

Redis_en_2

As the load increases on the same applications, the picture becomes ambiguous. However, let’s move on to more complex services.

MariaDB 10.3

Using a fork of MySQL, MariaDB 10.3, which is part of the CentOS / Debian distributions, traditionally, we create an InnoDB database of 1 million rows, from which we use only 100,000 to load the processor with SELECT queries from the pool stored in RAM. We have a very fast NVME drive that smooths log recording delays, so the performance of the storage system does not affect performance.

Mariadb_en_read_1

Mariadb_en_read_2

We see a situation typical of previous applications: the huge latency advantage of EPYC Rome 7532 in 1 thread, which melts as the load increases, giving way to Threadripper 2990WX.

ElasticSearch 7.6.0

If we say that 32 cores are needed for processing Big Data, then Elastic is the best example. Written in Java, this stack for working with statistical data and application logs is one of the most popular tools among DevOps and Data Science specialists.

Test Configuration
Hypervisor VMware ESXi 6.7U3
Test virtual machine VM 64 vCPU, 30 Gb Ubuntu 18.04 LTS Linux JAVA Runtime 11 ElasticSearch 7.6.0

From the built-in tests of the Rally package, I chose http_logs, because this test is quite large: 32 GB of data in expanded form, and about the same amount is taken up by the test results. The measurement is based on two metrics, the first of which is adding documents to the index.

Elastic_en_1

When testing in real applications, some results that break out of the General trend simply do not lend themselves to logical explanation. This is partly the fault of developers who aim to write an app rather than an accurate benchmark. In part, the error is affected by the accumulated delays in the software stack, and if you look at the delays, taking the results of the same processor as one, the spread becomes simply colossal.

AMD constantly emphasizes that the new architecture of EPYC Rome processors gives up to 40% more speed in JAVA applications compared to the first generation of EPYC, and in the Range test I have something to please them: the advantage is almost 10-fold, but there is no clear winner in the battle between the new server and the old gaming CPU.

Tensorflow / Keras

I use the task to build a text generation model based on existing news publications. Using the Textgenrnn project, I input a 16-megabyte text file, and start building the model, selecting the batch size parameter so that its minimum value loads the CPU as much as possible. Yes, I know that such calculations are faster, cheaper and more practical on the GPU, but building neural networks is not always just about the GPU. Where accuracy of calculations is important or multiple models are calculated simultaneously, CPUs are still used.

Tensorflow_en

We remember that when loading all 32 cores, Thread reaper 2990WX and EPYC 7532 have almost the same frequency, and all the other optimizations of the Zen2 architecture are not involved in long mathematical calculations.

Price

At the time of preparing the review, the ASRock X399D8A-2T motherboard cost about the same as the ASRock EPYCD8-2T, around $550, which means that you can’t save money on the platform for AMD Threadripper processors. The situation with the processors themselves is much more interesting:

  • EPYC 7551p - 1300$
  • Threadripper 2990WX - 1525$
  • EPYC 7532 - 3250$

In general, at the time of preparing the review, AMD had 9 (!) variants of 32-core processors of the first and second generation: for single-socket servers, for two-socket servers, on the first-generation core, on the second-generation core with 128 MB of L3 cache, on the second-generation core with 256 MB of L3 cache, - a real Paradise for fans to choose in the price range from $ 1300 for EPYC 7551p and up to $ 3400 for EPYC 7542, well, the most affordable EPYC Rome 7452 series will cost only $ 2025.

Don’t forget that EPYC supports Registered memory, but Threadripper doesn’t.

  • 16 GB ECC Registered DDR4 2666 MHz - 80$
  • 16 GB ECC Unbuffered DDR4 2666 MHz - 110$
  • 16 GB non-ECC Unbuffered DDR4 2666 MHz - 70$

The simplest configuration of the processor, motherboard, and 256 GB of RAM will be as follows:

  • EPYC 7551p + ASRockRack EPYCD8-2T + 256Gb ECC RDIMM DDR4 = 2490$
  • Threadripper 2990WX + ASRockRack X399D8A-2T + 256Gb DDR4 ECC DIMM = 2955$
  • Threadripper 2990WX + ASRockRack X399D8A-2T + 256Gb Non-ECC DIMM DDR4 = 2660$
  • EPYC 7532 + ASRockRack EPYCD8-2T + 256Gb ECC RDIMM DDR4 = 4440$

Also do not forget that under EPYC 7532 it makes sense to buy the fastest memory DDR4 3200, which will add even more to the cost of the machine.

Conclusions

Different applications by their nature show completely different results, and the “faster/more expensive” combination is not always better. We found out that:

  • for tested simple low-thread applications, the best choice is the EPYC Rome 7532, and for renting VPS under 1C - this is the best solution. Most likely, the trend will continue on other software products with the same type of load.

  • at the same time, in a multithreaded 1C - MS SQL bundle, the first EPYC 7531p shows the same speed as the EPYC Rome 7532, so you can save a lot of money here

  • for databases with a large number of connections, the Threadripper 2990WX with its high frequencies is better suited, and it works faster in applications related to Big Data and Machine Learning

The new EPYCs with an increased to 256 MB cache of the third level are installed in the same motherboards as the first EPYC with the “Naples” core, and all you need is to flash the latest BIOS. If you already have a server running on the first EPYC, and you want to change the processor to EPYC Rome, first make sure that the BIOS with Zen2 support can be poured on your motherboard, because the first motherboards had a 16-megabyte ROM chip, and the new processors need a 32-megabyte one. By purchasing a platform for Threadripper for the server, you can take a motherboard that is made on the same production line as the most server-based boards in the world, you will have ECC memory and NVME RAID. For such installations, ASRock Rack X399D8A-2T is the only and therefore the best purchase.

Well, as for psychological barriers and prejudices, I would be happy to list how much you lose by choosing a gaming CPU instead of a server, but I can’t remember anything except official support from VMware. Anticipating questions like “what about the new generation of Threadripper”… let’s just say that there are no server motherboards for them yet. As soon as they appear, we will test them.