Enhanced the performance of `` operator.
MXNet now automatically set OpenMP to use all available CPU cores to maximize CPU utilization when `NUM_OMP_THREADS` is not set.
Unary and binary operators now avoid using OpenMP on small arrays if using OpenMP actually hurts performance due to multithreading overhead.
Significantly improved performance of `broadcast_add`, `broadcast_mul`, etc on CPU.
Added bulk execution to imperative mode. You can control segment size with `mxnet.engine.bulk`. As a result, the speed of Gluon in hybrid mode is improved, especially on small networks and multiple GPUs.
Improved speed for `ctypes` invocation from Python frontend.
New Features - Gradient Compression [Experimental]
Speed up multi-GPU and distributed training by compressing communication of gradients. This is especially effective when training networks with large fully-connected layers. In Gluon this can be activated with `compression_params` in Trainer.
New Features - Support of NVIDIA Collective Communication Library (NCCL) [Experimental]
Use kvstore=’nccl’ for (in some cases) faster training on multiple GPUs.
Significantly faster than kvstore=’device’ when batch size is small.
It is recommended to set environment variable NCCL_LAUNCH_MODE to PARALLEL when using NCCL version 2.1 or newer.
New Features - Advanced Indexing [General Availability]
NDArray now supports advanced indexing (both slice and assign) as specified by the numpy standard: with the following restrictions:
if key is a list type, only a list of integers is supported, e.g. key=[1, 2] is supported, while not for key=[[1, 2]].
Ellipsis (...) and np.newaxis are not supported.
Boolean array indexing is not supported.
New Features - Gluon [General Availability]
Performance optimizations discussed above
Added support for loading data in parallel with multiple processes to The number of workers can be set with num_worker. Does not support windows yet.
Added Block.cast to support networks with different data types, e.g. float16.
Added Lambda block for wrapping a user defined function as a block.
Generalized `` to support arbitrary number of arrays.
New Features - ARM / Raspberry Pi support [Experimental]
MXNet now compiles and runs on ARMv6, ARMv7, ARMv64 including Raspberry Pi devices. See for more information.
New Features - NVIDIA Jetson support [Experimental]
MXNet now compiles and runs on NVIDIA Jetson TX2 boards with GPU acceleration.
You can install the python MXNet package on a Jetson board by running - `$ pip install mxnet-jetson-tx2`
New Features - Sparse Tensor Support [General Availability]
Added more sparse operators: `contrib.SparseEmbedding`, `sparse.sum` and `sparse.mean`
Added `asscipy()` for easier conversion to scipy
Added `check_format()` for sparse ndarrays to check if the array format is valid
Bug Fixes
Fixed a[-1] indexing doesn't work on `NDArray`
Fixed `expand_dims` if axis < 0
Fixed a bug that causes topk to produce incorrect result on large arrays.
Improved numerical precision of unary and binary operators for `float64` data.
Fixed derivatives of log2 and log10. They used to be the same with log.
Fixed a bug that causes MXNet to hang after fork. Note that you still cannot use GPU in child processes after fork due to limitations of CUDA.
Fixed a bug that causes `CustomOp` to fail when using auxiliary states.
Fixed a security bug that is causing MXNet to listen on all available interfaces when running training in distributed mode.
Doc Updates
Added a security best practices document under FAQ section.
Fixed License Headers including restoring copyright attributions.
Documentation updates
Links for viewing source
Known Issues
Attempt to run distributed training with FP16 lead to failure.
PR-8904: Fix multiprocessing too many open files
- Problem: The data-loader has a file-descriptor leak when the number of files being opened exceeds the ulimit of the user session.
- Workaround: Increase the ulimit of the user session or at the OS level using any of the following steps:
- $ ulimit -n unlimited (to make the fd limit as unlimited), OR
- $ ulimit -n 60000 (to make the fd limit as 60,000 files assuming you have around but less than 60,000 files to be opened) OR
- $ echo 65535 > /proc/sys/fs/file-max (to increase the system-wide file limit) along with the ulimit command.
- Further details:
How to build MXNet
Please follow the instructions at
List of submodules used by Apache MXNet (Incubating) and when they were updated last
Submodule:: Last updated by MXNet:: Last update in submodule
cub@:: 31-Jul :: 28-Aug
dlpack@: 08-Sep :: 06-Oct
dmlc-core@: 08-Sep:: 06-Oct
mshadow@: 03-Oct:: 09-Oct
nnvm@: 10-Sep:: 10-Oct
ps-lite@: 22-Nov:: 21-Nov