This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Page tree
Skip to end of metadata
Go to start of metadata

Introduction

This page details benchmark results comparing MXNet 1.3.0 with MKLDNN vs without MKLDNN (integration proposal). The results clearly shows that MKL-DNN boosts inference throughput between 6x to 37x, latency reduced between 2x to 41x, while accuracy is equivalent up to an epsilon of  1e-8.

Inference Performance

This group of the performance test is gathered on AWS EC2 instance C5.18xLarge with 1 socket and 1 processor.

For the throughput, 2 sockets can provide about 2X speedup while latency will keep the constant.

Performance on Intel CPU with Intel MKL-DNN backend in release 1.3

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

$ export KMP_AFFINITY=granularity=fine,compact,1,0

$ export OMP_NUM_THREADS=18

$ numactl --physcpubind=0-17 --membind=0 python …


CategoryModelLatency batchsize=1 (ms, small is better)Throughput batchsize=128 (fps, big is better)
w/o MKL-DNNw/ MKL-DNNspeedupw/o MKL-DNNw/ MKL-DNNspeedup
CNN/classificationResNet-50 v197.1913.047.4510.29163.5215.90
ResNet-50 v298.6913.027.589.94154.1715.51
Inception v3175.1716.7710.445.74135.3323.57
Inception v4330.9331.4010.543.0469.6022.87
DenseNet111.6618.905.918.52149.8817.60
MobileNet38.564.428.7324.87512.2520.60
VGG16406.5020.0720.252.9170.8424.31
AlexNet64.603.8017.0026.58965.2036.32
inception-resnet v2181.1049.403.675.4882.9715.14
CNN/object detectionFaster R-CNN1175.74118.629.910.858.5710.08
SSD-VGG16721.0347.6215.141.43(batchsize=224)28.90(batchsize=224)19.13
SSD-MobileNet239.4028.338.454.07(batchsize=256)69.97(batchsize=256)14.18
RNNGNMT683.4394.007.271.46(batchsize=64)10.63(batchsize=64)6.83
GANDCGAN8.940.2437.85109.134249.3638.94

Performance AMD CPU with Intel MKL-DNN backend in release 1.3

The m5a.24xlarge offers 96 vCPUs using the AMD EPYC processors (AVX2)


CategoryModelThroughput batchsize=32 (fps, bigger is better)
w/o MKL-DNNw/ MKL-DNNspeedup
CNN/classificationResNet-50 v12.4438.57x15.8
MobileNet5.03194.7x38.7

Inference Accuracy

The c5.18xlarge instance offers a 2-socket Intel Xeon Platinum processor with 72 vCPUs.

The model is from gluon model zoo by pre-trained parameters. The top1 and top5 accuracy are verified by MKL-DNN backend. 

As below table shown, the accuracy from MXNet 1.3 without and with MKL-DNN got the exact same results with 10e-8.

Note: The dataset used ImageNet1k valdata/ are generated by imagenet1k-val.sh

Inference Accuracy Comparison
AliasNetworkCPU (without MKL-DNN)CPU (with MKL-DNN) BackendDelta
 top1 top5 top1 top5top1top5
alexnetAlexNet0.563125000.789921880.563125000.789921880.000000000.00000000
densenet121DenseNet-1210.742031250.919296880.742031250.919296880.000000000.00000000
densenet161DenseNet-1610.771953130.933906250.771953130.933906250.000000000.00000000
densenet169DenseNet-1690.757109380.928281250.757109380.928281250.000000000.00000000
densenet201DenseNet-2010.769062500.930937500.769062500.930937500.000000000.00000000
inceptionv3Inception V3 299x2990.776093750.936640630.776093750.936640630.000000000.00000000
mobilenet0.25MobileNet 0.250.510390630.756875000.510390630.756875000.000000000.00000000
mobilenet0.5MobileNet 0.50.618515630.837890630.618515630.837890630.000000000.00000000
mobilenet0.75MobileNet 0.750.665468750.870703130.665468750.870703130.000000000.00000000
mobilenet1.0MobileNet 1.00.700937500.891093750.700937500.891093750.000000000.00000000
mobilenetv2_1.0MobileNetV2 1.00.699765630.892812500.699765630.892812500.000000000.00000000
mobilenetv2_0.75MobileNetV2 0.750.682109380.880078130.682109380.880078130.000000000.00000000
mobilenetv2_0.5MobileNetV2 0.50.644531250.849296880.644531250.849296880.000000000.00000000
mobilenetv2_0.25MobileNetV2 0.250.508906250.745468750.508906250.745468750.000000000.00000000
resnet18_v1ResNet-18 V10.708125000.894531250.708125000.894531250.000000000.00000000
resnet34_v1ResNet-34 V10.739609380.916093750.739609380.916093750.000000000.00000000
resnet50_v1ResNet-50 V10.760625000.930468750.760625000.930468750.000000000.00000000
resnet101_v1ResNet-101 V10.779375000.936171880.779375000.936171880.000000000.00000000
resnet152_v1ResNet-152 V10.783203130.938671880.783203130.938671880.000000000.00000000
resnet18_v2ResNet-18 V20.710468750.896718750.710468750.896718750.000000000.00000000
resnet34_v2ResNet-34 V20.740859380.915781250.740859380.915781250.000000000.00000000
resnet50_v2ResNet-50 V20.767500000.931875000.767500000.931875000.000000000.00000000
resnet101_v2ResNet-101 V20.781250000.940156250.781250000.940156250.000000000.00000000
resnet152_v2ResNet-152 V20.785546880.941406250.785546880.941406250.000000000.00000000
squeezenet1.0SqueezeNet 1.00.572734380.795546880.572734380.795546880.000000000.00000000
squeezenet1.1SqueezeNet 1.10.570234380.796015630.570234380.796015630.000000000.00000000
vgg11VGG-110.670625000.875312500.670625000.875312500.000000000.00000000
vgg13VGG-130.681328130.879843750.681328130.879843750.000000000.00000000
vgg16VGG-160.720625000.905859380.720625000.905859380.000000000.00000000
vgg19VGG-190.734687500.910000000.734687500.910000000.000000000.00000000
vgg11_bnVGG-11 with batch normalization0.689531250.888828130.689531250.888828130.000000000.00000000
vgg13_bnVGG-13 with batch normalization0.698359380.889531250.698359380.889531250.000000000.00000000
vgg16_bnVGG-16 with batch normalization0.722265630.903906250.722265630.903906250.000000000.00000000
vgg19_bnVGG-19 with batch normalization0.729921880.909921880.729921880.909921880.000000000.00000000


CMD for Reproducing Result

Please access the script and model from the link below.

https://drive.google.com/open?id=17JenLnZKsmPoZIIyktINFfMjZtDY2Ehc 

(Note: select the parent folder and click download in the drop-down menu)

You can refer to launch_benchmark_aws.sh for reproducing.

  • No labels

3 Comments

  1. Patric Zhao could you add information about the tooling you were using?

    1. Anton Chernov sure, I have added a section for reproducing the results (smile)