Apache MXNet Release Adds Support for New NVIDIA Volta GPUs and Sparse Tensor
We are excited about the availability of Apache MXNet version 0.12. With this release, MXNet adds two new important features—support for NVIDIA Volta GPUs, and support for Sparse Tensors.
Support for NVIDIA Volta GPU Architecture
The MXNet v0.12 release adds support for NVIDIA Volta V100 GPUs, enabling customers to train convolutional neural networks up to 3.5 times faster than on the Pascal GPUs. Trillions of floating-point (FP) multiplications and additions for training a neural network have typically been done using single precision (FP32) to achieve high accuracy. However, recent research has shown that that the same accuracy can be achieved using half-precision (FP16) data types.
The Volta GPU architecture introduces Tensor Cores. Each Tensor Core can execute 64 fuse-multiply-add ops per clock, which roughly quadruples the CUDA core FLOPS per clock per core. Each Tensor Core performs D = A x B + C, where A and B are half-precision matrices, while C and D can be either half or single-precision matrices, thereby performing mixed precision training. The new mixed-precision training allows users to achieve optimal training performance without sacrificing accuracy by using FP16 for most of the layers of a network, and higher precision data types only when necessary.
You can take advantage of Volta Tensor Cores to enable FP16 training in MXNet by passing a simple command, --dtype float 16 to the train_imagenet.py script.
Sparse Tensor Support
MXNet v0.12 adds support for sparse tensors to efficiently store and compute tensors allowing developerto perform sparse matrix operations in a storage and compute-efficient manner and train deep learning models faster. MXNet v0.12 supports two major sparse data formats–Compressed Sparse Row (CSR) and Row Sparse (RSP). The CSR format is optimized to represent matrices with large number of columns where each row has only a few non-zero elements. The RSP format is optimized to represent matrices with huge number of rows where most of the row slices are completely zeros. For example, the CSR format can be used to encode the feature vectors of input data for a recommendation engine, whereas the RSP format can be used to perform the sparse gradient updates during training. This release enables sparse support on CPU for most commonly used operators such as matrix dot product and element-wise operators. Sparse support for more operators will be added in future releases.
Follow these tutorials to learn how to use the new sparse operators in MXNet.