To support large tensor operations in MXNet backend, we need to update the followings:
1) Support large tensor size in NDArray data structure. We need to make sure the data structure of a tensor can hold sufficiently large number of elements.
2) Allow index loop to go beyond 2^31:
In CPU operator implementation, the kernel always use a Map() function to process each data element. The indexing variable need to use int64_t
A PR has been submitted to address a subset of the operators:
3) Update different API interfaces
This involves the API interface between MXNet backend and different front end language.
E.g. On the language binding side (front-end language like Python), int64_t values needs to be passed to the C++ backend. (Python: `ctypes.c_int64`) Otherwise, the value gets truncated at the language binding side.
There are two defined data types used in MXNET backend in addition to the native integer types: index_t, and dim_t. An earlier PR has been submitted to use int64_t for index_t and dim_t:
Since int32 are used by many language bindings, we will add extra 64-bit C APIs and use them in python Python bindings first. Other language binds can choose to use the 64-bit APIs if they also plan to support large indices.