sviko.com

How memory encryption in AMD EPYC 7000 works, or how it protects your cloud

Let’s give a simple analogy: if your company’s data is money, then earlier the physical server on which they were stored was a protected safe: a closed piece of iron with a password, located behind three locks and under protection. But with the transition to the cloud, the server is more like an ATM: it is located in a public place, anyone can approach it, check the balance on the map, withdraw money, kick it with a foot or a hammer, what’s there to trifle - even take it with you, as has happened… You do not care about the security problems of the ATM for only one reason: the contents of the ATM - is the property of the Bank, and if there will steal money - you will not be affected. But in the case of information, it’s different. Yes, today most of the data is stored in the clouds, not “in safes”, but in “ATMs”, on servers used by other customers. When you enter into a service agreement, you often do not even know where the server is physically located, do not know the identity of its administrators and the security measures that are carried out by the cloud provider. Yes, you have an SLA, but your losses in case of information theft may not be covered by the responsibility of the cloud provider.

Test bench configuration:

Today, when conventional cloud systems are transformed into hybrid clouds, it is necessary to provide the same level of security both on the local server, which is under three locks of the company’s office, and on the remote virtual, which may be not known anything except the virtual configuration. At the same time, the number of vulnerabilities in enterprise software related to virtualization is growing every year, and the overall dynamics of the number of new vulnerabilities according to NIST (https://www.nist.gov/) grows exponentially:

We see double growth in 2017-2018, and the trend continues in the first half of 2019. I heard that when renting a virtual machine manager told to the client about respectable and intelligent neighbors on the server - and I still do not know whether this was true or not, because there are types of attacks in which the owner of the virtual machine will be able to access the data of all virtual machines on the same server.

Such attacks associated with increased access rights can use not only software errors in the OS code, but also hardware bugs in the processor architecture, which is confirmed by the sad experience of Intel, which is forced to close the vulnerabilities in it’s Xeon’s at the cost of reduced performance. Even in test our servers based on Intel Xeon processor runs an operating system in the state of autumn of 2018 (until Meltdown/Spectre), because the security patches are so slow down the car that it already interferes with the normal testing.

But there is one company in the world that meets every new vulnerability in Intel processors under the clink of glasses of champagne and applause: this is AMD, because not only is their CPU architecture not subject to vulnerabilities associated with speculative execution of commands, so also their EPYC 7000 series processors were originally developed for a higher level of security when working in isolated environments: conventional and container virtualization, that is all that today is most in demand in the IT business.

Amd_secure_cpu

In General,under the big words “security technology”, there are two main innovations:

  • The first is a separate security core AMD Secure Processor (ARM Cortex-A5), which controls the generation and storage of keys, as well as taking over the mechanism of loading a trusted operating system. It is integrated into every AMD EPYC 7000 series processor.
  • The second is the encryption of RAM by the AES-128 algorithm, which is performed completely at the hardware level by the memory controllers themselves. And if we all know and use disk encryption, such protection of RAM for many IT-specialists is still a novelty.

To begin with, I studied the offer of the new market of cloud providers for June 2019. None of the friends I companies did not advertise any particular technology regarding security. Well, we’ll have to figure it out on their own. The more interesting it will be to understand why the target audience of the EPYC 7000 processors was so cool about the technology, which is designed to give them a competitive advantage in a very overheated market. As they say, let’s go!

OK, what does AMD have to protect memory?

Having copied the contents of memory, an attacker can pull not only the encryption keys to the disk volume encrypted by LUKS and Bitlocker (although today it is more difficult to do), but also directly the data itself, which can be stored in the database cache or another application. It is impossible to ensure complete isolation of your data, protecting them from any hacking, but it is possible to make them unreadable for the attacker using 128-bit RAM encryption, for which there are several methods in AMD EPYC processors. The simplest is TSME (Transparent Secure Memory Encryption), a transparent encryption of all the host RAM with a single key. When using this method, neither the operating system nor the software is aware that the data in the RAM is encrypted, so you can use absolutely any application and any operating system.

Bios_1 Bios_2 Bios_3

TSME is switched on in the BIOS of the motherboard, but be ready that you may not have this feature, as it conflicts with other encryption methods, and the manufacturers do not advertise it. In our test motherboard ASRock Rack EPYC8D-2T it is included through the hidden menu in the BIOS when you press CTRL+ALT+F3, but neither in the instructions nor on the company’s website about its support was not a word.

From what types of attacks protects TSME?

From Cold Boot (https://en.wikipedia.org/wiki/Cold_boot_attack), in which the attacker freezes a special spray memory modules on a running machine to a temperature of -50 degrees Celsius and turns off the power, preventing the operating system to reset the information in the cells. At such a low temperature, de-energized memory can store all its information up to 6 minutes, and at -100 degrees - up to 10 minutes. Attacker only needs to install the DIMM module in a machine with an upgraded BIOS and its own operating system, and merge the information stored on it, in order to safely extract from the memory dump encryption keys, passwords, user data, and in General everything that was in the operating system. The video below is a good example of such hacking.

In this attack, there is a simple analogue is theft modules of non-volatile memory. Today this type of RAM is promoted by companies such as HPE and Dell, so that with the growth of its popularity, this attack will be more common.

Nvdimm

Each time the server is loaded, the processor changes the memory encryption key, so it will not be possible to read the stolen data even after downloading the server from your flash drive. In addition to forced transparent encryption, AMD EPYC 7000 processors support memory scrambling, which is enabled similarly via a secret tab in the BIOS. These two modes work independently, and nullify attempts to open the server by a method of cold loading.

It turns out that if you use a server dedicated exclusively to your applications (for example, a database server or storage), the total memory encryption function is a serious barrier against an attack related to physical access to DIMM/NVDIMM modules. Unfortunately, AMD TSME does not save you from other types of attacks.

AMD SME

Without the prefix “Transparent”, the function Secure Memory Encryption works on calls to the operating system, which selects which pages of memory to encrypt and which - not. The SME function has two modes of operation: encryption of all memory is similar to TSME, or only selected pages, but in all cases a single shared key is used, which does not change during the whole time after booting the operating system.

In general, from my point of view in the presence of TSME, the SME function already looks excessive since it requires support from the software. At the time of preparation of this article, SME is supported only by the family of operating systems the Linux kernel 4.14 and above. But in some enterprise distros, such as Oracle Linux 7.6 on kernel 3.10, the SME feature is supported and enabled by default. In other distributions, to enable SME, you need to add the

mem_encrypt=on

parameter to the boot line (grub configuration file). You can check the operation of SME by running the command in the terminal:

dmesg | grep SME

you should receive:

[ 0.000000] AMD Secure Memory Encryption (SME) active

Similar to TSME, this function protects the data in memory against attacks involving physical access to the memory.

Why SME better than TSME?

First of all, SME is better its update mechanism: if for fully hardware-TSME all updates are only done through flashing the BIOS, which is the manufacturer of the motherboard can forget, for SME vendor releases updates to the Linux kernel via the GIT repository. But this information is rather important for software developers, because the system administrator can only update the operating system in the usual manner, and be sure that the system has all the patches related to encryption.

And of course, selective RAM encryption achieves better performance, so let’s test the speed. Let’s start with a non-relational redis database used as an in-memory caching server.

I want to note that on such modern processors as AMD EPYC 7000 with an active energy saving system, 1-stream Redis gives a huge measurement error, and in our case, the inclusion of SME has always accelerated the operation of the system. Fortunately, in this test it is more important for us that there is no noticeable decrease in either get or SET operations when encryption is enabled.

With pipeline access, we can see a maximum speed reduction of about 5% in database write requests.

When testing the MySQL 5.7.26 database, I decided to change Oracle Linux 7.6 to Ubuntu 18.04, using a Read-Only pattern for two tables of 10,000 records each.

Let’s just say I was looking for a performance difference in RAM-intensive applications and didn’t find it. Small differences in the indicators can be safely attributed to the measurement error, so if you install a dedicated server for the database, whether NoSQL cache or SQL database, then encryption will not gives you performance hit. There are studies showing that the decrease in speed increases with the increase in the dataset. I was running a MySQL test with a 2.7 GB database (12 million records) and similarly did not notice the impact of TSME on speed.

In general, the configuration of the processor EPYC 7551p (32 cores at 2 GHz), it is more suitable not for the dedicated server under one application, but for the cloud!

SEV: the pearl of all AMD protection

Well, let’s say, you rent a virtual server from a cloud provider, and I’m the sysadmin of this service with a lot of free time and root access to the server where runs your VM. Technically, I can read the contents of your virtual machine’s memory and pulling out credit card numbers, passwords, or your customers e-mail addresses: anything your company values. And even if I respect my workplace and will not do, the virtual environment is affected “VM Escape” attack, where the client renting the VPS applies the attack on the elevation. As a result, it is the same as the administrator, can read the memory of virtual machines, receiving the necessary data.

AMD has also taken care of this type of attack: the Security Encrypted Virtualization (SEV) feature encrypts the area of RAM allocated to the virtual machine, regardless of the hypervisor. Each VM uses their own key, and neither hypervisor root admin, nor hackers using attack of the rights elevation will not be able to read the contents of a memory dump of your virtual machine. Moreover, the AMD SEV FUNCTION works independently of TSME / SME, and on one physical server you can combine both conventional virtual machines and their variants with individually encrypted memory, and any application running on the host can use SME encryption.

Amd_sme_vs_sev

If you think that SEV is limited to encryption only, you are mistaken: when processing data from a virtual machine, the processor marks all information with tags corresponding to a specific virtual machine. This technology is very similar to virtual networks (VLANs) in network switches, with the only difference that instead of a network switch we have a 180-Watt 32-core SoC, and instead of network ports - virtual machines. All the time while the data of single VM is received or generated inside the processor, it remains available only for this virtual machine, no matter where it is, including all levels of cache. Even the hypervisor and the host operating system do not have access to the memory pages of the VM being processed CPU. Thus, even at the time of computing in the processor, the data remains shielded not only from other virtual machines but also from the server environment itself.

If the attacker wants to compromise a hypervisor by installing a driver, replacing the mapping of physical memory to access the VM’s RAM, AMD-VI (IOMMU) you will not allow this attack to take place. In principle, the topic of protecting the download and run area is too broad for this article, and if possible, we’ll look at it another time.

Whether you are a customer or a cloud service provider, it is important to remember that with AMD SEV VIRTUAL machine in the cloud is protected not only from hackers from outside, but also from the host server. Practically, we are talking about the fact that any attack on the caches or mechanisms prefetching architecture AMD EPYC, are meaningless, since at best the hacker will get the data in encrypted form. However, as practice has shown, the researchers were able to break this barrier, but more on that later.

Give us a fly in the ointment

Of course, everything is fine, but AMD SEV has a serious problem: this technology is supported only by the QEMU platform under Linux, so all the software from VMWare and Microsoft does not use all these delights (although they say that work in this direction is going and soon everything will be fine, just be patient). But that’s not all: for AMD SEV to work, you need not only the hypervisor, QEMU libraries and Libvirt to support it, but also to use a modern version of Linux as a guest operating system. Full support AND SEV appeared only in Kernel 4.14, but as it should be in the Linux world, in all distributions everything is different: for example, in Ubuntu 19.04 on kernel 5.1 by default, all AMD SME/SEV FUNCTIONS are disabled, and in Oracle Linux 7.6 with kernel 3.10 - on the contrary, because the developer has applied all the necessary patches. AMD itself recommends to use as a hypervisor SUSE Enterprise Linux 15 or Fedora 28+, and as a guest OS - Ubuntu 18.04, but I believe that for the role of hypervisor better Oracle Linux 7.6, and if necessary you may use 5-th kernel from the repository UEK5.

AMD has a good repository on GITHub with scripts to install the SEV environment on SLES-15, RHEL8, Fedora 28(29) and Ubuntu 18.04 with an example of running an encrypted guest virtual machine. It should be noted that since neither Virtual Manager and nor virsh do not support AMD SEV, the start and stop of virtual machines are performed by commands from the GUI shell of the hypervisor, that is why the desktop environment should be pre-installed and started.

Of course, AMD SEV can be enabled for existing virtual machines, if you add secure boot parameters to their XML files. So if you are interested - we recommend reading the instructions in the SUSE Linux Enterprise Server blog. When everything is set up and running, run the command

dmesg | grep SVE

in the guest operating system terminal. The conclusion should be:

Amd_sev_enable

I want to add that starting with Linux 4.19 / QEMU 3.1, guest SEV-systems have received support for almost any device with direct access to memory, including GPU, encryption for them is transparent. Also in the AMD repository there are tools to support the migration of the SEV-machine between hosts.

Is AMD SEV supported in containers?

Yes, AMD is proud of the encrypted RAM to virtuallock implemented in Open-source project Kata Containers (https://katacontainers.io), developed by the OpenStack Foundation. This platform has open support on operating systems Clear Linux, Fedora and CentOS 7, it is used by such Internet giants as JD.Com, supports Docker and Kubernetes orchestrators, while providing security at the same level as virtual machines.

In fact, the architecture of the Kata is such that the containers run inside of a SEV-encrypted virtual machines. This, of course, is not as pleasant and not as secure as in the case of individual encryption in the hypervisor, but in multi-tenant architectures that offer CAAS (Container as a Service) services, you can encrypt for example a virtual node dedicated to the client. Again, the same service can be offered both with an increased level of security and with the traditional, adjusting its marketing to customer requests.

How AMD SEV affects performance

Let’s continue our testing with Redis 4.9 on Ubuntu 18.04 platform with qemu hypervisor from AMDSEV repository. The same Ubuntu 18.04 with 8 GB of RAM and 64 cores was used as a guest VM.

!! Don’t compare these results with the previous ones: different versions of Linux and Redis differ in speed almost twice.

Yes, we can see how QEMU devours about 15% of the system performance, but turning on memory encryption does not have any significant impact on the speed. Let’s move on to testing read operations in MySQL, using 2 tables of 10,000 records from a common database containing 12 tables of 1 million records each with a total capacity of 2.7 GB.

And for the first time it was possible to see a noticeable decrease in speed when using encryption in MySQL running inside the SEV-machine. Note: there is no performance difference when encryption is disabled.

Perhaps, it is necessary to draw a conclusion concerning performance: I didn’t see any weakness of memory controllers and the encryption mechanism. On transactional loads, typical for databases, the difference in speed should be sought in the daytime with fire, but the software, its versions and the quality of compilation affect performance more than anything else. So for the best speed, compile all of the sources under your parameters.

What does a competitor have?

By the way, Intel also has a similar technology SGX, but it works quite differently. First, it is a purely software technology that uses the processing power of the cores, and secondly Intel encrypts data not at the virtual machine level, but at the application level. This means that you can, for example encrypt all the memory of the hypervisor and all its guest operating systems, but this will not save you from an attack like VM Escape, and you can encrypt only one database in some container. Let’s compare the capabilities of Intel and AMD:

AMD SME AMD SEV Intel SGX
Protection against physical access Yes Yes *
Protection of whole system memory Yes No
Protection from:
hypervisor compromise Yes *
guest OS compromise Yes *
Requires recompilling of application No No Yes
* - each application must be modified and recompiled to enable protection

Here I want to add that few people will agree to change the application and recompile on their own. For this purpose, the network has entire services that offer containers compiled from software compiled for Intel SGX. One of these repositories is Scone, so if you trust the community - look, maybe you need an application already someone compiled and posted on the network.

What does the client need?

Digging into technology, the description of the algorithms of hacking and protection, I totally forgot, for what it is in fact. Maybe the customers don’t need all this?

I’ve never heard of anyone freezing RAM modules with a spray and carrying out a “cold-boot attack” somewhere in the data center. Perhaps because such hackers or system administrators of such data centers (depending on which of them is more fortunate) can not say anything more, but I tested with my own hands SAS-disks, discarded from the data center of the Bank, which did not go through the data utilization, and which has some files on it.

Praise AMD:

Most interesting is that AMD SEV is a technology for such careless people, such as those that surround us. For those who do not put every update of the operating system, saves on professional configuration and maintenance of security and software. This is an opportunity for low-cost cloud services to declare the isolation of the virtual machine, shielded even from the eyes of the provider. And all this works on free software and does not require payment for the license. This is a kind of Zero-Day protection against threats that have not yet appeared, which sooner or later will hack the neighbor’s VPS.

As for the encryption of the memory of the entire host, I would count on transparent TSME as a simple and reliable technology. Anyone out there know what the operating system decides to encrypt and what not? You can’t check it easy ways and when you upgrade software SME may be switched off. So if your top secret server keeps a state secret, make it resistant to Cold Boot attacks by a simple parameter in the BIOS.

Criticize AMD:

In fairness, it should be said that in may 2018, researchers were able to bypass AMD Secure Encrypted Virtualization, gaining access to the memory of the guest VM, on a compromised hypervisor. This is a very strange story, and here’s why: for example, scripts for Meltdown/Spectre attack are openly posted on GitHub, and for the story with “Severed” (so-called AMD exploit) there is only one publication of the German laboratory type “we found a bug - try to understand how it works”, the official answer of AMD, sounding like “Well, we will understand” and the answer of the Germans to the answer of AMD: “Well, understand”. That’s all: no discussions, no exploits - N O T H I N G ! Yes, the bug was fixed quickly enough - it was enough to update the microcode, but everything was done somehow quietly, without too much hype, as it usually happens.

Do not expect that now you put a free Ubuntu and get a secure cloud, like large providers who foolishly spend millions dollars on is, and you will beat the pants off. Nothing in this world is given for free, and the version of QEMU/libvirt under Ubuntu distributed through the repository AMDSEV is a pitiful sight: with 4 GB of RAM, the virtual machine starts perfectly, with 16 GB - every second time, and with 32 GB - does not start at all. Of the corporate Linux loudly declaring support for AMD SME and SEV free is only Oracle Linux with outdated (but damn fast) software in repositories. On “non-production” distributions like Ubuntu and Fedora and even Debian, these features are disabled due to a conflict with the GPU drivers. For the world of free software such heterogeneity is a common thing.

The price and conclusions

Consider the protective functions of the EPYC 7000 processors as a safety saving. You have hardware isolation of virtual environments available out of the box, and if you build a cloud based on Linux, you can only laugh at technologies such as Microsoft’s Shielded VM, which can turn the VM into a “brick” at any awkward movement. You have a protection from a compromised host by connecting strange devices, from downloading untrusted hypervisor, you have a communication between VM’s in CPU similar to a VLAN within the network switch, and all of it recently supports migration between hosts.

You get all this as part of the server, and for you the cost of protection is included in the price of hardware along with delivery, installation cost and extended warranty packages. But initially configuring the server with AMD EPYC 7000 under Linux, you understand that the VM shielding technology from Microsoft would cost you an additional $ 11 550, because that is how much the license for Windows Server 2016 Datacenter Edition for 32-core processor costs.