Credit to Zhennan for this proposal

Problem

Although data parallel is used in MXNet, its performance is not good enough for the less computational intensive operators in the inference stage, especially for the small batchsize. This phenomena widely exists in many popular models, such as googlenet, wide deep and inception v3. For example in the wide deep model, 26 embedding OPs are executed in sequence and each one only consumes very little computing resource. So, the model level performance is sub-optimal since the long execution path in the low-parallelism operators.

...

Page tree

Versions Compared

Old Version 13

New Version Current

Key

Credit to Zhennan for this proposal

Problem

Page tree

Page History

Versions Compared

Old Version 13

New Version Current

Key

Credit to Zhennan for this proposal

Problem