...
The above points ensure that we find a compromise between the flexibility of MXNet, and fast inference in TensorRT. We do this with no additional burden to the user. Users do not need to learn how TensorRT APIs work, and do not need to write their own client application or data pipeline.
How do I
...
use TensorRT integration?
You can optionally build the container yourself as follows, but we recommend you skip this step to use the DockerHub hosted version of the MXNet TensorRT containers:
Code Block |
---|
docker build -t ci/docker/Dockerfile.build.ubuntu_gpu_tensorrt mxnet_with_tensorrt |
...
Code Block |
---|
nvidia-docker run -ti --rm mxnet/tensorrt bash
|
After starting the container, you will find yourself in the /opt/mxnet directory by default.
Running a "hello, world" model / unit test (LeNet-5 on MNIST)
You can then run the LeNet-5 unit test, which will train LeNet-5 on MNIST using the symbolic API. The test will then run inference in MXNet both with, and without MXNet-TensorRT runtime integration. Finally, the test will display a comparison of both runtime's accuracy scores. The test can be run as follows:
Code Block |
---|
python ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_lenet5.py |
...
Running inference in MXNet
[03:31:18] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
Running inference in MXNet-TensorRT
MXNet accuracy: 98.680000
MXNet-TensorRT accuracy: 98.680000
Running more complex models
The unit test directory also provides a way to run models from the Gluon model zoo after slight modifications. The models that are tested are CNN classification models from the Gluon zoo. They are mostly based on ResNet, but include ResNeXtas well:
- cifar_resnet20_v1
- cifar_resnet56_v1
- cifar_resnet110_v1
- cifar_resnet20_v2
- cifar_resnet56_v2
- cifar_resnet110_v2
- cifar_wideresnet16_10
- cifar_wideresnet28_10
- cifar_wideresnet40_8
- cifar_resnext29_16x64d
Please note that even those examples are based on CIFAR-10 due to the ease of accessing the dataset without formal registration and preprocessing, everything should work fine with models trained on ImageNet, using MXNet's ImageNet iterators, based on the RecordIO representation of the ImageNet dataset.
The script can be run simply as:
Code Block |
---|
python ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_resnet_resnext.py |
...
===========================================
Model: cifar_resnet56_v1
===========================================
*** Running inference using pure MXNet ***
MXNet: time elapsed: 2.463s, accuracy: 94.19%
*** Running inference using MXNet + TensorRT ***
TensorRT: time elapsed: 1.652s, accuracy: 94.19%
TensorRT speed-up (not counting compilation): 1.49x
Absolute accuracy difference: 0.000000
===========================================
Model: cifar_resnet110_v1
===========================================
*** Running inference using pure MXNet ***
MXNet: time elapsed: 4.000s, accuracy: 95.20%
*** Running inference using MXNet + TensorRT ***
TensorRT: time elapsed: 2.085s, accuracy: 95.20%
TensorRT speed-up (not counting compilation): 1.92x
Absolute accuracy difference: 0.000000
As you can see, the speed-up varies by model. ResNet-110 has more layers that can be fused than ResNet-56, hence the speed-up is greater.
Running TensorRT with your own models with the symbolic API
When building your own models, feel free to use the above ResNet-50 model as an example. Here, we highlight a small number of issues that need to be taken into account.
1. When loading a pre-trained model, the inference will be handled using the Symbol API, rather than the Module API.
2. In order to provide the weights from MXNet (NNVM) to the TensorRT graph converter before the symbol is fully bound (before the memory is allocated, etc.), the arg_params
and aux_params
need to be provided to the symbol's simple_bind
method. The weights and other values (e.g. moments learned from data by batch normalization, provided via aux_params
) will be provided via the shared_buffer
argument to simple_bind
as follows:
Code Block |
---|
executor = sym.simple_bind(ctx=ctx, data = data_shape,
softmax_label=sm_shape, grad_req='null', shared_buffer=all_params, force_rebind=True) |
...
Code Block |
---|
def merge_dicts(*dict_args):
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, epoch)
all_params = merge_dicts(arg_params, aux_params) |
...
Code Block |
---|
for idx, dbatch in enumerate(test_iter):
data = dbatch.data[0]
executor.arg_dict["data"][:] = data
executor.forward(is_train=False)
preds = executor.outputs[0].asnumpy()
top1 = np.argmax(preds, axis=1) |
...
Code Block |
---|
def set_use_tensorrt(status = False):
os.environ["MXNET_USE_TENSORRT"] = str(int(status)) |
...
Code Block |
---|
print("Running inference in MXNet")
set_use_tensorrt(False)
mx_pct = run_inference(sym, arg_params, aux_params, mnist, all_test_labels, batch_size=batch_size)
print("Running inference in MXNet-TensorRT")
set_use_tensorrt(True)
trt_pct = run_inference(sym, arg_params, aux_params, mnist, all_test_labels, batch_size=batch_size) |
...
Running TensorRT with your own models with the Gluon API
Note: Please first read the previous section titled "Running TensorRT with your own models with the symbolic API" - it contains information that will also be useful for Gluonusers.
Note: If the user wishes to use the Gluon vision models, it's necessary to install the gluoncv
pip package:
Code Block |
---|
pip install gluoncv |
The above package is based on a separate repository.
For Gluon models specifically, we need to add a data symbol to the model to load the data, as well as apply the softmax layer, because the Gluon models only present the logits that are to be presented for softmax. This is shown in python ${MXNET_HOME}/tests/python/tensorrt/test_tensorrt_resnet_resnext.py
. Here's the relevant code:
Code Block |
---|
net = gluoncv.model_zoo.get_model(model_name, pretrained=True)
data = mx.sym.var('data')
out = net(data)
softmax = mx.sym.SoftmaxOutput(out, name='softmax') |
...
Code Block |
---|
net = gluoncv.model_zoo.get_model(model_name, pretrained=True)
all_params = dict([(k, v.data()) for k, v in net.collect_params().items()])
executor = softmax.simple_bind(ctx=ctx, data=(batch_size, 3, 32, 32), softmax_label=(batch_size,), grad_req='null',
shared_buffer=all_params, force_rebind=True) |
...
Code Block |
---|
gluon.data.DataLoader(
gluon.data.vision.CIFAR10(train=False).transform_first(transform_test),
batch_size=batch_size, shuffle=False, num_workers=num_workers) |
...
A full tutorial is provided here but we'll summarize for a simple use case below.
Installation
Installing MXNet with TensorRT integration is an easy process. First ensure that you are running Ubuntu 16.04, that you have updated your video drivers, and you have installed CUDA 9.0 or 9.2. You’ll need a Pascal or newer generation NVIDIA gpu. You’ll also have to download and install TensorRT libraries instructions here. Once your these prerequisites installed and up-to-date you can install a special build of MXNet with TensorRT support enabled via PyPi and pip. Install the appropriate version by running:
To install with CUDA 9.0:
Code Block |
---|
pip install mxnet-tensorrt-cu90 |
To install with CUDA 9.2:
Code Block |
---|
pip install mxnet-tensorrt-cu92 |
If you are running an operating system other than Ubuntu 16.04, or just prefer to use a docker image with all prerequisites installed you can instead run:
Code Block |
---|
nvidia-docker run -ti mxnet/tensorrt bash |
Roadmap
Examples
Image classification examples are available here.
Benchmarks
TensorRT is still an experimental feature, so benchmarks are likely to improve over time. As of Oct 11, 2018 we've measured the following improvements which have all been run with FP32 weighted networks.
...