Page History

...

MKLDNN support
CuDNN support

Challenges

How to address this problem across all submodules
- mixed data types are used in several submodules such as mshadow, dlpack etc.
How to address this problem across all language bindings
- C APIs are used by many language bindings
CUDNN and MKLDNN support
- Need to make sure CUDNN and MKLDNN operators also support large indices
Performance impact
- there are known performance differences between int32 and int64 operations. How to make sure these differences would not cause severe performance regression in model training an inference
- memory footprint
Backward compatibility

Proposed Approach

Due to the challenges mentioned above, we plan to take staged development for this feature.

Stage 1: Use a compiler to enable/disable int64 support. By default this switch is off to prevent any performance impact

Stage 2: Benchmark performance between int64 and int32; fixing the performance difference that cause severe impact to training and inference. Turn on compiler flag by default

To support large tensor operations in MXNet backend, we need to update the followings:
1) Support large tensor size in NDArray data structure. We need to make sure the data structure of a tensor can hold sufficiently large number of elements.

2) Allow index loop to go beyond 2^31:
In CPU operator implementation, the kernel always use a Map() function to process each data element. The indexing variable need to use int64_t
A PR has been submitted to address a subset of the operators:
https://github.com/apache/incubator-mxnet/pull/13418

3) Update different API interfaces
This involves the API interface between MXNet backend and different front end language.

There are two defined data types used in MXNET backend in addition to the native integer types: index_t, and dim_t. An earlier PR has been submitted to use int64_t for index_t and dim_t:
https://github.com/apache/incubator-mxnet/pull/11742
https://github.com/dmlc/mshadow/pull/348
The shape size currently is also using uint32_t. Although the total number of elements can go above 4 billion, it is very rare for one dimension to go above that number. Therefore, we may be able to keep the shape size in uint32_t
Since int32 are used by many language bindings, we will add extra 64-bit C APIs and use them in python bindings first. Other language binds can choose to use the 64-bit APIs if they also plan to support large indices.

Future Development Guideline

...

Use index_t for indexing elements
Use dim_t for dimension size
Never use unsigned and avoid using uint32_t to declare a non negative number that is not exceeding 4 billion (see: https://google.github.io/styleguide/cppguide.html#Integer_Types)

Challenges

How to address this problem across all submodules
- mixed data types are used in several submodules such as mshadow, dlpack etc.
How to address this problem across all language bindings
- C APIs are used by many language bindings
CUDNN and MKLDNN support
- Need to make sure CUDNN and MKLDNN operators also support large indices
Performance impact
- there are known performance differences between int32 and int64 operations. How to make sure these differences would not cause severe performance regression in model training an inference
- memory footprint
Backward compatibility

Backward compatibility

We should support all existing operators with uint32_t data types.

...

Page tree

Versions Compared

Old Version 10

New Version 11

Key

Challenges

Proposed Approach

Future Development Guideline

Challenges

Backward compatibility