Aug 1st. 2018 - 5pm PDT session

Agenda:

Introduction Name/email/github handle/affiliation; interest or contribution on MXNet
1.3 open issues
Requests and suggestions for next release
Tech talks

User focused
Architecture/code focused

Suggestions & Questions

Hangout notes:

Introduction:
- Stephen Bull - learning MXNet, using CNN, learning DL
- Istvan Fehervari (Amazon) - started in Feb, contributed 2 operators
- Rohit Srivastava (AWS) - contributed 2 operators
- Ming Zhang (DeepGlint) - using MXNet in production
- Sina Afrooza (AWS) - supporting MXNet users
Stephen:
- tutorials really helpful -(Straight Dope), really dope
- interesting in contributing
- initially difficult to start contributing - simpler?
- figure out how to contribute
- communication: forum, dev@, slack, github
- egohands data set - format not readily usable, wrote some code, how to contribute code to massage data for use in MXNet, tutorial?
Istvan: lot of issues with 1.3 (issue with data loader, difficulty running in docker)
- used 1.1, found 1.2 bug, 1.2.1
- difficult to replicate - shared memory problem?
Ming Zhang from DeepGlint (github id starimpact) - use MXNet for training - face recognition, object ReID
- handle very bid data (billions of data)
- found many issues in MXNet - feedback in github - related to big data
- 6 month ago - ndarray cannot handle big data in GPU
- distributed training, need distributed big full connected (fc) layer - big fc 100 mio output nodes, have to split softmax across different GPU (model parallel)
- sparse tensor - not applicable (image input)
- implemented change for kv_store to handle big fc for 1.0, needs to merge and debug
- ask for help with implementation for kv_store - suggest to ask on discuss.mxnet.io or better start "WIP PR" and ask for feedback
- mxnet to support e2e fixpoint fine tuning (int8) - quantization layer in 1.3 is good, but need fine tuning for training
- has forked and changed - see github "restritto" (caffe based) - not hard to move functions into MXNet, suggests discussion about basic approach
- see https://arxiv.org/abs/1605.06402 and https://github.com/starimpact/ristretto_caffe (from https://github.com/pmgysel/caffe)
- asking for MXNet to become more powerful, asking for help from community
- using MXNet trained models in production (cloud architecture, mobile for Android, using chip from HiSilicon with Caffe models)
- using ncnn from Tencent for training embedded device models, TVM might be interesting for future, currently ncnn more helpful
- compared MXNet vs. Caffe, Tensorflow about 2y ago, days ago compared in Ali cloud, Ali optimized Tensorflow (faster then official version), still little bit slower then MXNet
- plans for improving distributed training? Yes!
- Ali added many techniques to TF distributed (GPU direct access, all MPI)
- continue discussion

Page tree

Aug 1st. 2018 - 5pm PDT session