nsoftmax based training has been widely used in face recognition and object reid.
but it is hard to train for restricted gpus and large scale data.
partial fc is the one method to solve this problem.
i have implemented one prototype code and it has been used in the training.
it is in mxnet_v0.8.0, i want to move it to the master branch of mxnet.
the github for single machine is: https://github.com/starimpact/mxnet_v0.8.0/tree/bLocalReset
the github for distributed training is: https://github.com/starimpact/mxnet_v0.8.0/tree/bProxy_Weight