Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Credit to Zhennan for this proposal (smile)

Problem

Although data parallel is used in MXNet, its performance is not good enough for the less computational intensive operators in the inference stage, especially for the small batchsize. This phenomena widely exists in many popular models, such as googlenet, wide deep and inception v3. For example in the wide deep model, 26 embedding OPs are executed in sequence and each one only consumes very little computing resource. So, the model level performance is sub-optimal since the long execution path in the low-parallelism operators.

...