sviko.com

Using Nvidia GTX1660 Super for machine learning: test and review

In the fall of 2018, Nvidia released a line of RTX graphics cards on chips codenamed Turing, with built-in tensor and RT cores. The community of enthusiasts using conventional gaming graphics cards for machine learning and artificial intelligence met the novelty without much enthusiasm: Yes, on the one hand, half-precision calculations in Floating Point 16 operations give a serious increase in speed, sometimes by 40-50% compared to Floating Point 32 calculations. But on the other hand, the high cost of video cards pushed to use “cloud services”, not always convenient, not always safe, not always clear on setting.

Fortunately, GPU performance in games and machine learning tasks often go hand in hand, and in gaming hardware, the greens have a very strong competitor in AMD. To combat them, Nvidia has released a series of GTX 1660 already without tensor cores, but with 6 GB of video memory, which until recently had two models: GTX 1660 Ti with 1536 stream processors with a frequency of 1635 MHz and the usual GTX 1660 with 1408 cores CUDA with a frequency of 1785 MHz. The new GTX 1660 Super differs from the usual GTX 1660 only in the type of memory-GDDR6 instead of GDDR5, and by this parameter the novelty becomes a record in the GTX 1660 series, providing the user with a record bandwidth of 336 GB / s. this is 75% higher than the usual GTX 1660 and 16% more than the “titanium series”. GDDR6 memory has a higher bandwidth per contact than GDDR5, so at a lower frequency has a higher data rate.

For comparison, in terms of RAM bandwidth, the novelty approached the GTX 1080 or RTX 2060, having a speed of 352 GB / s and 332 GB/s, respectively, but to RTX2080 (Ti) with their record 448 (616) MB / s they are still far away. By comparison, the Nvidia Tesla T4 has “only” about 320GB / s, the Tesla K80, which you can try out on Google Colab, is 420gb / s, the Tesla P100 is 720gb/ s, and the Tesla V100 is 900gb / s.

According to many experts, it is the bandwidth of video memory, not computing power, that is of key importance in the construction of neural networks. For example, in facial recognition, the amount of high-quality images that need to be presented for training is extremely large. It is also necessary to constantly check the results with new data sets to reduce the error rate. Depending on the application, new data can arrive very often and require constant training. When models involve many layers and nodes, there is a need for high memory and interface bandwidth to support neural network learning and output at peak speeds.

Simply put, today GTX 1660 Super is the only CUDA-compatible solution that at a price below 17 thousand rubles will give you a video memory speed of 336 GB/s.

Our hero: Palit GTX 1660 Super StormX

Most graphics cards on NVIDIA GTX 1660 chips use a 2-fan cooling system, and our test Palit GTX 1660 Super in this regard is favorably distinguished by its compactness: do you want to install a QNAP TS-677 in a NAS? Please! In short Mini-ITX case for 24x7 operation? Please: no length restrictions, and only the height of the Board should matter to you.


The cooling system uses a single radiator not only for the video chip, but also for the six memory chips and VRM elements. Thermal paste is used as the thermal interface for the GPU chip, and thermal pads (aka “gum”) are used for memory modules and VRM s.

With such a modest size of the cooler, the company Palit managed to use three symmetrical heat-conducting tubes, and due to this move, the video card even with such a compact size does not overheat.

With such a modest size of the cooler, the company Palit managed to use two symmetrical heat-conducting tubes, and due to this move, the video card even with such a compact size does not overheat.


Of the features of the Board I would like to note the metal protection against chip chip when the radiator is skewed. If you remember the days of Athlon XP, you will surely understand that on Palit video cards you can painlessly remove the native cooler, change the thermal paste and put it back without the risk of chipping off the GPU corner.

The board itself, judging by the empty places under the VRM keys and two memory chips, is designed for a more energy-intensive layout. The usual, “non-overclocked” version of the video card has a 3-phase power supply circuit, so overclocking is very successful: 1980/4800 MHz.

Testing

The first part is synthetic tests, and we start by evaluating integer and floating-point operations.


Let’s continue with Geekbench 5, which evaluates already basic algorithms of facial recognition and overlay of graphic filters on images.

Almost in the test OctaneBench 4.0, using a new rendering engine, the novelty shows very good results for a single GPU.

Let’s move on to the real test and measure performance in the most popular Tensorflow / Keras framework.

Let’s start with the simple tests included in the Keras package examples. We will compare with Tesla-mi provided in Google Colab. And while it’s undeniable Google shares GPU performance, it’s important for us to understand how much your on-premises GPU is comparable to what you’re given in the cloud.

Here is the very reason why I decided not to do the standard Resnet/CIFAR10 tests. Here, please: on simple calculations, the 280-dollar video card shows comparable performance with professional GPUs costing more than $ 5K.

Let’s take a real task for real training, well, for example, a text analysis project using Markov chains, known as Textgenrnn and run the training on a small text file of 2.59 MB with the parameter Batch_size=256.


Well, practically, the real speed of the video card is comparable to what you can get in a free Google Colab account, and you can talk for as long as you want about optimizing applications and RAM, which allows you to increase the batch_size parameter, but on small models it does not give any advantages, except for a quick “overfitting”.

Finishing our testing, I want to give the result of the video card on Etherium mining with default settings.

Power consumption and heat package

The gaming graphics card does not need any additional blowing, and of course its cooling system is not designed to work in 24x7 mode at maximum load. Fortunately, the maximum load GPU-we still have to try, and in a normal ATX case, the Board showed the following results:

  • Idle mode: 17W, 38 degrees Celsius, 1000 RPM
  • TensorFlow Keras Textgenrnn: 95W, 60 degrees Celsius, 1753 RPM
  • Furmark: 124.8 W, 69 degrees Celsius, 2271 RPM

The speed adjustment is smooth, the card pointedly slowly resists adding speed on the blades, but it also reluctantly slows down its cooler. In general, from the point of view of acoustic comfort is normal.

Conclusions

Here is a beautiful video card for the “micro-cloud”, which you can collect in the form factor Mini_ITX, put on the cabinet or in the pantry, and periodically load through Jupiter calculations of your projects. At the time of our review, there were no official Linux drivers for the GTX 1660 Super, but it is only a matter of time before they appear. And so, a good cooling system with 3 heat-conducting tubes and a compact form factor is almost ideal for self-Assembly, in which you need to be sure only that the Board will fit into the case in height. Today it is any modern case, except telecommunications 3U and some recumbent models for HTPC under the TV. What are we talking about, though? There are no low gaming graphics cards, and Palit GTX 1660 Super does not stand out here.

Of course, the fact that the Palit GTX 1660 Super does not stop the fan in idle mode is a drawback, but the noise level of the card at this time is below the background and can not be measured, that is, as long as the cooler is new, this board will not get you buzzing. Well, three heat-conducting tubes - this is the readiness of the video card to work in any conditions, even in compact bad-blown cases.