Link to dev list discussion
Community members currently involved in this work - Jake Lee (https://github.com/stu1130), Zhi Zhang (https://github.com/zhreshold), Naveen (https://github.com/nswamy) Karan (https://github.com/karan6181), Sina (https://github.com/safrooze)
Data pre-processing and post-processing are commonly used when training a deep learning model. Technically these data processing is called data transformations. Generally, Data Transformation are applied on train, val, and test dataset. In most cases, data transformations used on validation data during training phase, is applicable in the inference phase. However, MXNet models do not contain information about data transformations, creating a disconnect and barrier for easy movement of models from training to production inference deployment. Below are the problems we are aiming to solve in this work:
- Input/Output data transformations are not part of MXNet model
- Input/Output data transformations are currently support CPU only
- Few Input/Output data transformations are Python specific
- Not all data transformers are Hybrid Blocks in Gluon, hence, cannot be exported as symbol graph
- Input/Output data transformations takes single input only (Ex: Normalize takes 3D tensor - 1 image only as input)
Other notable problems for production model deployment are listed below. However, these problems are not addressed in this work and will be taken up in the next iteration:
- Input/Output signature: Saved model missing the information about the input/output descriptions, like name/shape, making the saved model unusable out of the box.
- File name, multiple files: Managing multiple files representing one model. Knowing epoch number. With this release, MXNet addresses these limitations by providing easy to use end-to-end-model APIs for saving the model along with the input/output data transformations and input/output data descriptions.
- As a data scientist, from MXNet Gluon, I should be able to concatenate data transformations with the neural networks, and export the end-to-end MXNet model using Gluon export APIs.
- As a user, I want to be able to load the end-to-end MXNet model and run inference (single/batch) in Gluon (Python), Module (Python), Scala and Java inference APIs. I should not be required to rewrite data transformations and I expect them to be part of the model.
- As a user, I should be able to run inference on these end-to-end model on CPU or GPU machine.
- As a user, I should be able to run single or batch request inference with different shapes on these end-to-end model on CPU or GPU machine.
- As a user, when I run inference with end-to-end models on CPU or GPU machine, I should not lose performance compared to running data transformations separately on CPU followed by inference on CPU/GPU.
End-to-end model is nothing but an MXNet model (sym, params) additionally with data transformation operators as a part of the graph. We just use the word end-to-end to indicate it has information about data transformation operators as part of same network graph.
- Can we have list of NDArrays as input? How does it work when bound to module for doing batch inference?
- For example, before resize data transformation operator, users may have images of different shapes, so they will not be able to have batch of NDArray (N, c, h, w). Instead, they will have a list of NDArrays (c, h1, w1), (c, h2, w2) and so on. How does this work?
- Can we have an operator in MXNet that can take a buffer (ex: image) or string (ex: filepath) as input?
- If we can achieve this, we can have, Image decode as first node in the graph enabling users to directly feed the raw data and get the output predictions totally simplifying model deployment and inference code.
- Can we have an operator in MXNet that can output String (ex: class name) or Vector(ex: Bounding box co-ordinates) (Non-NDArray) as output?
- If we can achieve this, we can have, post processing transformation operators in the graph, enabling users to get the ready to consume output predictions. For example, a model that can take raw input image as input and gives out class name as output! Totally, hiding framework specificities like NDArray in production deployment inference code.
- Make data transformation operators as any other MXNet operator i.e., they are available via `nd` and `sym` packages.
- Implement CPU and GPU support for data transformation operators.
- Support single input (3D Tensor), batch input (4D Tensor) and list input (list of 3D Tensor) as inputs to data transformation operators.
- Create a new data transformation operator - "list_to_batch". This will enable converting a list of NDArray to a batch data (4D Tensor) before pushing it to neural network for inference.
- Make all data transformations as Hybrid Block. This enables users to be able to concatenate data transformation blocks to neural network block and export the end to end model.
User experience - Model Training and Export
Symbolic graph of above exported model looks like below:
User experience - Model Inference in Python Module
User experience - Model Inference in Java Inference API
Addition of New APIs
There are NO NEW APIs introduced as part of this work in phase 1.
- All API changes and functionality addition is backward compatible. No existing functionality should be broken.
- No performance impact for existing use cases.
- For training jobs, there is no change in existing behavior, hence we do not expect any performance changes.
- For single/batch inference (on CPU), since the number of operators (transformations + network) are the same, we do not expect any performance changes.
- For single/batch inference (on GPU), with addition of transformation operator support on GPU, expect to see increased throughput and reduced latency (TBD Benchmarks and POC in progress)
- However, if the input data is small, number of data transformations are minimal and inference is running on a GPU, then running all transformations on GPU may slow things down compared to a multi-threaded CPU data transformation operation.
- GPU implementation of data transformation operators.
- Handling Single, Batch, List inputs for data transformation operators.
- Handling backward pass (very rarely used if at all) for data transformation operators.
Milestones (Development Plan)
Phase 1 - Image Classification
In Phase 1, we will be targeting image classification use cases with following transformation being commonly used during inference.
Exists => Functionality already exists currently
TODO/PR/Done => Tasks of this current work
|Transformer||Gluon Hybrid Block||CPU Operator||GPU Operator||Single Input (3D)||Batch Input (4D)||List Input|
|1||Normalize||Exists||Exists||WIP - PR - #13802||Exists||WIP - PR - #13802||TODO|
|2||To Tensor||Exists||Exists||WIP - PR - #13837||Exists||WIP - PR - #13837||TODO|
|3||Resize||WIP - PR - #13611||Exists||TODO||Exists||WIP - PR||TODO|
|4||list_to_batch||WIP||Exists as concat operator. Needs some update|
Exists as concat operator.
Needs some update
|5||RandomResizedCrop||TODO||Exists. Requires restructuring||N/A||Exists||N/A||N/A|
|6||CenterCrop||PR - #13694|
|7||Crop||PR - #13679||PR - #13679||N/A||PR - #13679||N/a||N/A|
|8||Java API image decoding - https://github.com/apache/incubator-mxnet/pull/13807 (By Qing Lan)|
|9||End to end example - Image classification - Train on Gluon, Inference on Java (TODO)|
Benchmarks - Above benchmarks with Java Predictor API on CPU, GPU comparing existing numbers with a fused single graph. (TODO)
|11||Blogpost and other user documentations (TODO)|
Phase 2 - Object Detection
Data transformation operators specifically for object detection use cases.
Ex: SSDRandomCrop, BoundingBoxFlip
Phase 3 - Image Segmentation
Data transformation operators specifically for image segmentation use cases.
Phase 4 - NLP use cases
Data transformation operators specifically for nlp use cases.
Other ideas and future work items
- Fused transformation operators for the common combination.
- ResizeCropNormlize: Perform fused resize, crop and normalization
- CropMirrorNormalize: Perform fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting.
- FastResizeCropMirror: Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work performed.
- RandomResizedCrop: Perform a crop with randomly chosen area and aspect ratio, then resize it to given size.
- ResizeCropMirror: Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping.
- Integration with NVIDIA DALI.
- Integration with other Data Processing engines - RAPIDS and Apache Arrow will have multiple advantages - Non-image based accelerated transformations on columnar data, out of the box support for various stable data loaders to read parquet, protobuf and such data formats, ability for users to use Pandas, Spark and cross framework data processing libraries and feed the data to MXNet.
- Test, able to load models trained in the previous versions of MXNet. (Backward compatible)
- Test, able to concatenate transformation to the network and export an end-to-end model
- Test, running inference on the saved end-to-end model with Python Module and Gluon SymbolBlock APIs. On CPU and GPU. Single, Batch and List input inference.
- Test, running inference on the saved end-to-end model with Java Inference APIs, Scala Inference APIs. On CPU and GPU. Single, Batch and List input inference.
- Benchmark and verify performance running inference with end-to-end model versus transformations separately on CPU followed by network prediction.
- All these tests will be added to CI tests.
Alternative Approach - 1
Create a new end-to-end model export API. End to end model is not just a sym and param file. It is an archive network graph and params, transformations graph and params, input/output signature and auxiliary resources like Synset file, sample input/output and more. In this approach two main changes:
- Extend Hybrid Block export API: In Gluon Hybrid Block export API, provide additional options for users to specify input/output signatures, and any other graphs (Hybrid blocks) to be exported (Ex: transformations hybrid sequential block). Export API will save the symbol graphs for each of these hybrid blocks tagged with a pre-defined name or user provide name. See an example symbol file below.
- Extend model import APIs: In Gluon/module/Java/Scala inference APIs, we create new APIs for importing end to end models.
See below for code samples:
Cons of this Approach
- New concept of end-to-end models for users. May create confusions and regression on existing models and new end-to-end models.
- Sharing data transformations from a training job to another training job is fairly straight forward as sharing the code files. In majority of cases all model building experiments happen on Python.
- For inference, there is typically single and simple transformation graph. This solution tries and enables more information to be saved than necessary.
- New Export, import APIs, model formats - Requires more understanding the problem domain.
Q1) Can I implement my own Image Transformation Block and later use it in inference?
A1) As long as your own Image Transformation Block is HybridBlock which takes advantage of existing MXNet operator, then the answer is yes.
If it's not, unfortunately, you need to implement the backend operator first. But current transform cover most of the use case for image task.
- ResNet-18 model pre-trained with ImageNet. https://s3.us-east-2.amazonaws.com/mxnet-public/end_to_end_models
- Pre-processing - Resize(224, 224), ToTensor, Normalize(mean=(
- We take average of 500 runs
- Single Request Inference - Input Data - Synthetic (random.uniform(0, 255, shape=(1, 300, 300, 3))
- Batch Inference - Input Data - Synthetic (random.uniform(0, 255, shape=(25, 300, 300, 3))
- Below time gives - Average Prediction Time Per Sample
|A||B||C||Non End to End Models (ms)||End to End Models (ms)||Boost %|
|Single Request Inference||Python (Module API)||17||14||17.65%|
|Java Inference APIs||17.09||14.16||17.14%|
|Scala Inference APIs||17.93||13.19||26.44%|
|Batch Inference (Batch size = 25)||Python (Module API)||15.18||12.57||17.19%|
|Java Inference APIs||18.54||13||29.88%|
|Scala Inference APIs||17||13.26||22.00%|
|Single Request Inference||Python (Module API)||5.78||3.14||45.67%|
|Java Inference APIs||8.95||4.26||52.40%|
|Scala Inference APIs||9.14||4.42||51.64%|
|Batch Inference (Batch size = 25)||Python (Module API)||2.61||1.31||49.81%|
|Java Inference APIs||8.03||5.53||31.13%|
|Scala Inference APIs||7.86||5.52||29.77%|
- Gluon-CV export helper https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/export_helper.py