In the fall of 2018, Nvidia released a line of RTX graphics cards on chips codenamed Turing, with built-in tensor and RT cores. The community of enthusiasts using conventional gaming graphics cards for machine learning and artificial intelligence met the novelty without much enthusiasm: Yes, on the one hand, half-precision calculations in Floating Point 16 operations give a serious increase in speed, sometimes by 40-50% compared to Floating Point 32 calculations. But on the other hand, the high cost of video cards pushed to use “cloud services”, not always convenient, not always safe, not always clear on setting.
Fortunately, GPU performance in games and machine learning tasks often go hand in hand, and in gaming hardware, the greens have a very strong competitor in AMD. To combat them, Nvidia has released a series of GTX 1660 already without tensor cores, but with 6 GB of video memory, which until recently had two models: GTX 1660 Ti with 1536 stream processors with a frequency of 1635 MHz and the usual GTX 1660 with 1408 cores CUDA with a frequency of 1785 MHz. The new GTX 1660 Super differs from the usual GTX 1660 only in the type of memory-GDDR6 instead of GDDR5, and by this parameter the novelty becomes a record in the GTX 1660 series, providing the user with a record bandwidth of 336 GB / s. this is 75% higher than the usual GTX 1660 and 16% more than the “titanium series”. GDDR6 memory has a higher bandwidth per contact than GDDR5, so at a lower frequency has a higher data rate.
For comparison, in terms of RAM bandwidth, the novelty approached the GTX 1080 or RTX 2060, having a speed of 352 GB / s and 332 GB/s, respectively, but to RTX2080 (Ti) with their record 448 (616) MB / s they are still far away. By comparison, the Nvidia Tesla T4 has “only” about 320GB / s, the Tesla K80, which you can try out on Google Colab, is 420gb / s, the Tesla P100 is 720gb/ s, and the Tesla V100 is 900gb / s.
According to many experts, it is the bandwidth of video memory, not computing power, that is of key importance in the construction of neural networks. For example, in facial recognition, the amount of high-quality images that need to be presented for training is extremely large. It is also necessary to constantly check the results with new data sets to reduce the error rate. Depending on the application, new data can arrive very often and require constant training. When models involve many layers and nodes, there is a need for high memory and interface bandwidth to support neural network learning and output at peak speeds.
Simply put, today GTX 1660 Super is the only CUDA-compatible solution that at a price below 17 thousand rubles will give you a video memory speed of 336 GB/s.