- New features
- Feature improvements
- Front end API
- Language Bindings
- Performance improvements
- Example and tutorials
- Website and documentation
- CI/CD
- License
- Miscellaneous changes
- How to build MXNet
New features
MXNet Extensions: custom operators, partitioning, and graph passes
Adds support for extending MXNet with custom operators, partitioning strategies, and graph passes. All implemented in a library easily compiled separately from the MXNet codebase, and dynamically loaded at runtime into any prebuilt installation of MXNet.
- fix for number of inputs/outputs for backward custom ops (#17069)
- Enhancements for custom subgraph op (#17194)
- Disable flaky test_custom_op_fork (#17481)
- fix custom op makefile (#17516)
- Update CustomOp doc with changes for GPU support (#17486)
- [WIP] MXNet Extensions enhancements (#17885) (#18128)
- Dynamic subgraph property (#17034)
- Dynamic subgraph property doc (#17585)
- [1.7] Backport MXNet Extension PRs (#17623, #17569, #17762) #18063 (#18069)
OpPerf utility enabled in the binary distribution
- [OpPerf] Add Neural network loss ops (#17482)
- [OpPerf] Fixes the issue when you pass NDArray to run_perf_test (#17508)
- [OpPerf] Fix markdown for native profile and add profile param in function desc (#17494)
- [OpPerf] Add Indexing ops (#16253)
- [OpPerf] Implement remaining random sampling ops (#17502)
- [OpPerf] Implement remaining GEMM ops (#17501)
- [OpPerf] Implement all linalg ops (#17528)
- [OpPerf] Fixed native output ordering, added warmup & runs command line args (#17571)
- [OpPerf] Add norm, cast ops, remaining optimizer ops (#17542)
- [Large Tensor] Fixed Embedding op (#17599)
- [OpPerf] Fixed Python profiler bug (#17642)
MKL-DNN
MKL-DNN as the default CPU backend in binary distribution
Branding change to DNNL
- Upgrade MKL-DNN dependency to v1.1 (#16823)
Support bfloat16 datatype
- Add bfloat16 floating-point format support based on AMP (#17265)
New operators
- [New Op] Add deformable conv v2 (#16341)
- Add MXNet Ops for fast multihead attention (#16408)
- Support boolean elemwise/broadcast binary add, multiply and true_divide (#16728)
- add gammaln, erf, erfinv (#16811)
- add aligned roi introduced in Detectron2 (#16619)
- Implement atleast_1d/2d/3d (#17099)
- Interleaved MHA for CPU path (#17138)
- Lamb optimizer update (#16715)
- Quantized Embedding (#16691)
- Add gelu fuse ops (#18082) (#18092)
Feature improvements
Numpy compatible interface(experimental)
- [NumPy] NumPy support for linalg.inv (#16730)
- add numpy op nan_to_num (#16717)
- [Numpy] Add sampling method for bernoulli (#16638)
- Fix numpy-compatible mean output type for integer inputs (#16792)
- [Numpy] Fix collect_params().zero_grad() in gluon numpy interface (#16716)
- [Numpy][Operator] 'where' Implementation in MXNet (#16829)
- [Numpy] Random.normal() with backward (#16330)
- Add OP diag [numpy] (#16786)
- Mixed precison binary op backward (use in) for numpy (#16791)
- add numpy op diagflat [numpy] (#16813)
- add op bitwise_or [numpy] (#16801)
- [Numpy] Implementation npx.{sample}_n (#16876)
- [Numpy] Add NumPy support for np.linalg.det and np.linalg.slogdet (#16800)
- Op Unravel_index PR [Numpy] (#16862)
- [Numpy] Fix imperative basic indexing in numpy (#16902)
- [Numpy] Basic indexing in symbolic interface of DeepNumpy (#16621)
- [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (#16804)
- [Numpy] Implement numpy operator 'average' (#16720)
- [Bugfix] [Numpy] Add `kAddTo` and kNullOp to Transpose (#16979)
- set rtol = 1e-2 and atol = 1e-4 when dtype == np.float32 in test_numpy_op.py:test_np_linalg_solve (#17025)
- Op_Diagonal [Numpy] (#16989)
- numpy bincount (#16965)
- [numpy] add op bitwise_not (#16947)
- [Numpy ]Modify np.random.shuffle to enable inplace by default (#17133)
- [numpy] fix argsort typo (#17150)
- [numpy] add op round (#17175)
- [numpy]Add op delete (#17023)
- [numpy] add op flipud, fliplr (#17192)
- [CI] Re-enable testing with numpy 1.18 (#17200)
- [Numpy] Add broadcast_to scalar case (#17233)
- [Numpy] Random.gamma() implemented (#16152)
- [Numpy] add row_stack (=vstack) (#17171)
- [Numpy] Add infra for performing constraint check (#17272)
- porting numpy-compatible hstack to master and add dstack for interoperability (#17030)
- adding asnumpy() to output of gather(implicitly called) to fix gather test in large vector and tensor tests (#17290)
- [numpy] add op random.exponential (#17280)
- [NumPy] Add NumPy support for norm (#17014)
- [numpy]add op random.lognormal (#17415)
- Add numpy random weibull operator (#17505)
- [numpy] Add np.random.pareto and np.random.power (#17517)
- [Numpy] Add sort op (#17393)
- [numpy]implement exponential backward (#17401)
- [Numpy] Where operator scalar version (#17249)
- [numpy] add op matmul (#16990)
- [numpy]add op random.logistic, random.gumbel (#17302)
- [numpy][Do Not Review]add op insert (#16865)
- [numpy] add op random.rayleigh (#17541)
- [numpy] add fallback ops (#17609)
- [numpy] add op pad (#17328)
- [numpy] add op fabs, sometrue, round_ (#17619)
- Add arange_like to npx (#16883)
- try to move shape_array to npx (#16897)
- support np.argsort (#16949)
- np.broadcast_to extension (#17358)
- support bitwise_and (#16861)
- fix np.argmax/argmin output data type (#17476)
- add op random.beta (#17390)
- add op isnan isinf (#17535)
- array_split pr (#17032)
- Mixed data type binary ops (#16699)
- randn implemented (#17141)
- refactor and reduce float types for some functions, also add bitwise_xor (#16827)
- any/all (#17087)
- amax (#17176)
- fix format (#17100)
- add op empty_like, add nan_to_num to dispatch (#17169)
- handle array_like fill_value for np.full; add unit test coverage (#17245)
- add np.amin (#17538)
- add npx.gather_nd (#17477)
- add np.random.chisquare (#17524)
- add polyval (#17416)
- add isposinf isneginf isfinite (#17563)
- Support broadcast assign for `npi_boolean_mask_assign_tensor` (#17131)
- Implement Weibull backward (#17590)
- support np.dsplit, fix some error msgs and corner cases for hsplit and vsplit, add interoperability tests for h/v/dsplit (#17478)
- add np.product (#17489)
- Implement np.random.pareto backward (#17607)
- add np.ediff1d (#17624)
- more support for boolean indexing and assign (#18352)
- Fix einsum gradient (#18482)
- [v1.7.x] Backport PRs of numpy features (#18653)
- [v1.7.x] backport mixed type binary ops to v1.7.x (#18649)
- revise activations (#18700)
Large tensor support
- [Large Tensor] Add support to Random Sample & Pdf ops (#17445)
- [Large Tensor] Add LT support for NN optimizers and 1 activation function (#17444)
- [Large Tensor] Fixed SoftmaxActivation op (#17634)
- [Large Tensor] Fixed col2im op (#17622)
- [Large Tensor] Fixed Spatial Transformer op (#17617)
- [Large Tensor] Fix ravel_multi_index op (#17644)
- Sparse int64 Large tensor support (#16898)
- Re-Enabling Large Tensor Nightly on GPU (#16164)
- enabling build stage gpu_int64 to enable large tensor nightly runs (#17546)
MKL-DNN enhancement
- MKLDNN FC : Add error info when mkldnn fc bias dimension is wrong (#16692)
- [MKLDNN] support mkldnn gelu (#16710)
- [MKLDNN] Fix int8 convolution/fc bias overflow (#16734)
- [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737)
- Mkldnn fullyConnect bwd bug fix (#16890)
- Revert Mkldnn fullyConnect bwd bug fix (#16890) (#16907)
- [MKLDNN] Use MKLDNNRun (#16772)
- [MKLDNN] mkldnn RNN operator enhancement (#17075)
- [MKLDNN] enable MaxPooling with full pooling convention (#16860)
- update mkldnn to v1.1.2 (#17165)
- improve mkldnn doc (#17198)
- [MKLDNN] Fix _copyto (#17173)
- [MKLDNN] Support channel wise quantization for FullyConnected (#17187)
- fixed seed for mkldnn test (#17386)
- add mkldnn softmax backward (#17170)
- cmake: copy dnnl headers to include/mkldnn (#17647)
- [mkldnn]Mkldnn bn opt backport from master to 1.7x (#18009)
- [v1.x] Update 3rdparty/mkldnn remote URL and pin to v1.3 (#17972) (#18033)
- [v1.x] backport #17900 [MKLDNN] support using any format in pooling backward (#18067)
- Static link MKL-DNN library (#16731)
- Add large tensor nightly tests for MKL-DNN operators (#16184)
- [MKL-DNN] Enable and Optimization for s8 eltwise_add (#16931)
- [MKL-DNN] Enhance Quantization Method (#17161)
- Static Build and CD for mxnet-cu102/mxnet-cu102mkl (#17074)
- MKL-DNN RNN backward path enhancement (#17183)
- cmake: check USE_OPENMP and pass proper MKL-DNN build flags (#17356)
- update mkl to 2020.0 (#17355)
- Enable MKL-DNN by default in pip packages (#16899)
- Enable MKL-DNN FullyConnected backward (#17318)
- Softmax primitive cache and in-place computation (#17152)
- boolean_mask_assign with start_axis (#16886)
- use identity_with_cast (#16913)
- change error tolerance for bf16 bn (#18110)
- [v1.x] Backport #17689 and #17884 to v1.x branch (#18064)
- refactor codes and add an option to skip/check weight's version to reduce overhead (#17707) (#18039)
- [v1.x] Backport #17702 and #17872 to v1.x branch (#18038)
TensorRT integration
- Update TensorRT tutorial to build-from-source. (#14860)
- Minor fix, use RAII for TensorRT builder and network object (#17189)
Quantization
- Add silent option to quantization script (#17094)
Profiler
- Implemented final two binary ops, added default params for functionality (#17407)
- Implement remaining nn_activation ops in opperf (#17475)
- Implement all miscellaneous ops (#17511)
- Implement remaining nn_basic ops in opperf (#17456)
ONNX
- Fix memory leak reported by ASAN in NNVM to ONNX conversion (#15516)
- ONNX export: Gather (#15995)
- ONNX export: Slice op - Handle None value for ends (#14942)
New models
- [Model] Implement Neural Collaborative Filtering with MXNet (#16689)
- Further optimization for NCF model (#17148)
- HMM Model (#17120)
Operator improvements
- Faster GPU NMS operator (#16542)
- [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of mirrored operators (#16022)
- dynamic custom operator support (#15921)
- Multi Precision Lamb Update operator (#16885)
- Add im2col and col2im operator (#16502)
- Quantized Elemwise Mul Operator (#17147)
- Enhancements for MXTensor for custom operators (#17204)
- Enabling large tensor support for binary broadcast operators (#16755)
- Fix operators lying about their number of inputs (#17049)
- [WIP] Fallback mechanism for mx.np operators (#16923)
- Dynamic custom operator GPU support (#17270)
- Fix flaky - test_operator_gpu.test_np_insert (#17620)
- MXNet FFI for Operator Imperative Invocation (#17510)
- [MXNET-978] Higher Order Gradient Support `logp1`, `expm1`, `square`. (#15416)
- [MXNET-978] Higher Order Gradient Support `arcsin`, `arccos`. (#15515)
- [MXNET-978] Higher Order Gradient Support `rsqrt`, `rcbrt`. (#15476)
- gather_nd: check bound and wrap negative indices (#17208)
- Remove dilation restriction for conv3d (#17491)
- Fix storage type infer of softmax backward (#17576)
- Fix and optimize handling of vectorized memory accesses (#17767) (#18113)
- Cherry-pick of #17995 and #17937 to 1.x branch (#18041)
- No tensor cores for fp32 interleaved attention, remove div by 8 restriction (#17994) (#18085)
- GPU gemms true fp16 (#17466) (#18023)
- Add support for boolean inputs to FusedOp (#16796)
Bug fixes
- [BUG FIX] Always preserve batch dimension in batches returned from dataloader (#16233)
- Fix SliceChannel Type inference (#16748)
- change _generate_op_module_signature get_module_file open with encoding=utf-8,it fix some encode error in Chinese windows system. (#16738)
- Fix rtrue_divide grad (#16769)
- fix inv test flakiness using random matrices generated by SVD (#16782)
- [MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (#16234)
- Fix (#16781)
- fix expand_dims fall back when input's ndim is 0 (#16837)
- [fix] missing input log higher order. (#15331)
- Fix IndentationError in setup.py (#16857)
- Fix a few np issues (#16849)
- Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph (#16836)
- fix for enable model parallelism for non-fp32 data (#16683)
- Fix NDArrayIter iteration bug when last_batch_handle='pad' (#16166)
- Fix crashing on Windows in ObjectPool ~ctor (#16941)
- Fix NDArrayIter cant pad when size is large (#17001)
- fix axis=-1 bug (#17016)
- Fix CUDNN detection for CMake build (#17019)
- Fix omp assert issue (#17039)
- mshadow: fix vector access (#17021)
- [BUGFIX] Fix race condition in kvstore.pushpull (#17007)
- [BUGFIX] Fix trainer param order (#17068)
- [BugFix] fix filter channel calculation in ModulatedDeformableConvV2 (#17070)
- Fix reshape interoperability test (#17155)
- fix norm sparse fallback (#17149)
- fix py27 quantization (#17153)
- fix int8 add ut (#17166)
- Fix and clean up Ubuntu build from source instructions (#17229)
- fix lstm layer with projection save params (#17266)
- Fix rendering of ubuntu_setup.md codeblocks (#17294)
- Fix #17267, add expected and got datatype for concat error msgs (#17271)
- [BUGFIX] fix model zoo parallel download (#17372)
- fix use int8, uint8, int32, int64 (#17188)
- [Fix] Add ctx to the original ndarray and revise the usage of context to ctx (#16819)
- Fix ndarray indexing bug (#16895)
- fix requantize flaky test (#16709)
- Initial checkin (#16856)
- Fix flakey test_ndarray.py:test_reduce (#17312)
- fix flaky test: boolean index and fix bugs (#17222)
- Fix IOT Devices section of Get Started page (#17326)
- add logic for no batch size while getting data arrays from executors (#17772) (#18122)
- Fix reverse shape inference in LayerNorm (#17683)
- fix full and full_like when input is boolean (#17668)
- Fix MBCC inference (#17660)
- Additional fix for vector access. (#17230)
- Cherrypick Fix nightly large_vector test caused by incorrect with_seed path (#18178) (#18220)
- [1.7] Pass args fix3 (#18237)
- fixing batch_norm and layer_norm for large tensors (#17805) (#18261)
- [1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) (#18316)
- [v1.7.x] backport #18500 - [Bug Fixed] Fix batch norm when grad_req is `add` (#18517)
- Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703)
Front end API
- Fix the problem in printing feature in c++ API examples : feature_extract (#15686)
- updating MXNet version to 1.6.0 in base.h for C APIs (#16905)
- [API] unified API for custom kvstores (#17010)
- fix parameter names in the estimator api (#17051)
- adding docs for 64bit C APIs of large tensor (#17309)
- Add API docs to INT64 APIs (#16617)
Gluon
- [Quantization] Enhance gluon quantization API (#16695)
- [Gluon] Improve estimator usability and fix logging logic (#16810)
- Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (#16834)
- [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (#16922)
- Include eval_net the validation model in the gluon estimator api (#16957)
- Fix Gluon Estimator nightly test (#17042)
- [MXNET-1431] Multiple channel support in Gluon PReLU (#16262)
- Fix gluon.Trainer regression if no kvstore is used with sparse gradients (#17199)
- refactor gluon.utils.split_data() following np.array_split() (#17123)
- Add RandomApply in gluon's transforms (#17242)
- Partitioning Gluon HybridBlocks (#15969)
- Random rotation (#16794)
- bump up atol for gradient check (#16843)
- Extend estimator.evaluate() to support event handlers (#16971)
- [MXNET-1438] Adding SDML loss function (#17298)
Symbol
- Add unoptimized symbol to executor for sharing (#16798)
- Enforces NDArray type in get_symbol (#16871)
- Fix #17164 symbolblock with BatchNorm inside during cast to fp16 (#17212)
- autograd video and image link fixes and removing symbol tutorials (#17227)
- Fix CosineEmbeddingLoss in when symbol API is used (#17308)
- Fix Horovod build error due to missing exported symbols (#17348)
- Update symbol.py (#17408)
- update symbol to json (#16948)
Language Bindings
Python
- Python 2 compatibility fix in base.py
- adding stacktrace in Jenkinsfile_utils.groovy to inspect Python2 failure cause in CI (#17065)
- Fix image display in python autograd tutorial (#17243)
- Fix Python 3 compatibility in example/speech_recognition (#17354)
- Stop testing Python 2 on CI (#15990)
- Docs: Python tutorials doc fixes (#17435)
- pin python dependencies (#17556)
- Python 2 cleanup (#17583)
C/C++
- Simplify C++ flags (#17413)
R
- fix R docs (#16733)
- [R package] Make R package compilation support opencv 4.0 (#16934)
- Support R-package with cmake build and fix installation instructions (#17228)
- Fix R-package/src/Makevars for OpenCV 4 (#17404)
- Fix typo in Install the MXNet Package for R (#17340)
Clojure
Julia
- [MXNET-1440] julia: porting `current_context` (#17142)
- julia: porting `context.empty_cache` (#17172)
- pin Markdown version to 3.1 in Julia doc build (#17549)
Perl
- [Perl] - ndarray operator overloading enhancements (#16779)
- MXNET-1447 [Perl] Runtime features and large tensor support. (#17610)
Scala
- Fix scala publish & nvidia-docker cublas issue (#16968)
- Fix publishing scala gpu with cpu instance (#16987)
- swap wget to curl in Scala scripts (#17041)
- [Scala/Java] Remove unnecessary data slicing (#17544)
- quantile_scalar (#17572)
- Fix get_started scala gpu (#17434)
- Fix MBCC & scala publish pipeline (#17643)
- Bump up additional scala 1.x branch to 1.7.0 (#17765)
Performance improvements
- Build.py improvement (#16976)
- Improvements to config.cmake (#17639)
- [Done] BilinearResize2D optimized (#16292)
- Speed fused_op compilation by caching ptx and jit-compiled functions (#16783)
- Improve the speed of the pointwise fusion graph pass (#17114)
- broadcast_axis optimization (#17091)
- Optimize AddTakeGrad Tensor Sum (#17906) (#18045)
Example and tutorials
- Add CustomOp tutorial doc (#17241)
- Correct the grammar in 1-ndarray tutorial (#17513)
Website and documentation
- Website edits (#17050)
- [Website 2.0] Nightly Build for v1.x (#17956)
- [docs] Fix runtime feature detection documentation (#16746)
- Adding user guidelines for using MXNet built with Large Tensor Support (#16894)
- fix typo and doc (#16921)
- large tensor faq doc fix (#16953)
- [DOC] Add a few tips for running horovod (#17235)
- Update NOTICE to fix copyright years (#17330)
- [DOC] Fix tutorial link, and better error msg (#17057)
- doc fix for argmax & argmin (#17604)
CI/CD
- support mixed-precision true_divide (#16711)
- Try to fix CI (#16908)
- mixed precision for power (#16859)
- Fix desired precision for test_ndarray.py:test_reduce (#16992)
- [reproducibility] multi_sum_sq review, AtomicAdd removal (#17002)
- fix precision problem in linalg_solve, linalg_tensorinv, linalg_cholesky op test (#16981)
- grouping large array tests based on type and updating nightly CI function (#17305)
- [LICENSE] fix cpp predcit license (#17377)
- [CI] Fix static build pipeline (#17474)
- skipping tests that cannot fit in nightly CI machine corrected imports (#17450)
- Update Windows CI scripts to use syntax compatible with Win 2019 server powershell. (#17526)
- Fix Non-ASCII character in docstring (#17600)
- [CI] Follow redirects when downloading apache-maven-3.3.9-bin.tar.gz (#17608)
- [CI] Upgrade sphinx and autodocsumm (#17594)
- Reduce load on CI due to excessive log flood (#17629)
- Enable users to specify BLAS (#17648)
- [CI] Add AMI id to instance info on builds (#17649)
- [v1.7.x] Backport staggered CI builds (#17999 & #18119) (#18142)
- [v1.7.x] Backport #17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) (#18147)
- Fix formatting and typos in CD README.md (#16703)
- [CD] dynamic libmxet pipeline fix + small fixes (#16966)
- [CD] enable s3 publish for nightly builds in cd (#17112)
- [CD] fix CD pipeline (#17259)
- [CD] update publish path (#17453)
- fix CD and remove leftover from #15990 (#17551)
- Fix nightly build (#16773)
- Update pypi_publish.py to disable nighlty build upload to Pypi (#17082)
- [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339)
- Remove manually created symbolic link to ninja-build (#18437) (#18456)
- Increase staggered build timeout to 180 min (#18568) (#18585)
License
- Don't relicense FindCUDAToolkit.cmake (#17334)
- fix license and copyright issues (#17364)
- Update ps-lite LICENSE (#17351)
- remove unused file with license issue (#17371)
- Update LICENSE for fonts (#17365)
- license np_einsum file under bsd (#17367)
- Update Apache License for mshadow (#18109) (#18134)
- Julia: remove downloading of the non-ASF binary build (#18489) (#18502)
- Add missing license header for md files (#18541)
- [v1.7.x]License checker enhancement (#18478)
Miscellaneous changes
- Link fixes4 (#16764)
- Refactoring names for mxnet version of nnvm to avoid conflicting with the original tvm/nnvm. (#15303)
- minor typo fix (#17008)
- Add micro averaging strategy to pearsonr metric (#16878)
- introduce gradient update handler to the base estimator (#16900)
- fix latency calculation and print issue (#17217)
- add inference benchmark script (#16978)
- change the wording and log level to be more in line with the general use (#16626)
- Updated logos. (#16719)
- Pinning rvm version to satisfy Jekyll build (#18016)
- Workaround gnu_tls handshake error on Ubuntu 14.04 Nvidia Docker (#18044)
How to build MXNet
Please follow the instructions at https://mxnet.incubator.apache.org/get_started
List of submodules used by Apache MXNet (Incubating) and when they were updated last
name | commit-id | last updated in MXNet | last update in module |
dlpack | 3efc489 | Jan 20, 2020 | Feb 16, 2020 |
dmlc-core | b3a4c71 | Dec 10, 2019 | Apr 25, 2020 |
googletest | eb9225c | Jan 14, 2019 | Apr 16, 2020 |
mkldnn | 07579e6 | Mar 31, 2020 | Apr 24, 2020 |
nvidia_cub | c3cceac | Feb 16, 2018 | Jul 17, 2019 |
onnx-tensorrt | f4745fc | Jul 12, 2019 | Apr 23, 2020 |
openmp | b76842e | Jul 18, 2019 | Oct 15, 2019 |
ps-lite | f601054 | Jan 24, 2020 | Feb 28, 2020 |
tvm | 9bd2c7b | Jan 23, 2020 | Apr 26, 2020 |