Agenda:

  • Introduction Name/email/github handle/affiliation; interest or contribution on MXNet
  • 1.3 open issues
  • Requests and suggestions for next release
  • Tech talks
    • User focused
    • Architecture/code focused
  • Suggestions & Questions


Hangout notes:

  • Introduction:
    • Stephen Bull - learning MXNet, using CNN, learning DL
    • Istvan Fehervari (Amazon) - started in Feb, contributed 2 operators
    • Rohit Srivastava (AWS) - contributed 2 operators
    • Ming Zhang (DeepGlint) - using MXNet in production
    • Sina Afrooza (AWS) - supporting MXNet users
  • Stephen:
    • tutorials really helpful -(Straight Dope), really dope
    • interesting in contributing
    • initially difficult to start contributing - simpler?
    • figure out how to contribute
    • communication: forum, dev@, slack, github
    • egohands data set - format not readily usable, wrote some code, how to contribute code to massage data for use in MXNet, tutorial?
  • Istvan: lot of issues with 1.3 (issue with data loader, difficulty running in docker)
    • used 1.1, found 1.2 bug, 1.2.1 
    • difficult to replicate - shared memory problem?
  • Ming Zhang from DeepGlint (github id starimpact) - use MXNet for training - face recognition, object ReID
    • handle very bid data (billions of data)
    • found many issues in MXNet - feedback in github - related to big data 
    • 6 month ago - ndarray cannot handle big data in GPU
    • distributed training, need distributed big full connected (fc) layer - big fc 100 mio output nodes, have to split softmax across different GPU (model parallel)
    • sparse tensor - not applicable (image input)
    • implemented change for kv_store to handle big fc for 1.0, needs to merge and debug
    • ask for help with implementation for kv_store - suggest to ask on discuss.mxnet.io or better start "WIP PR" and ask for feedback
    • mxnet to support e2e fixpoint fine tuning (int8) - quantization layer in 1.3 is good, but need fine tuning for training
    • has forked and changed - see github "restritto" (caffe based) - not hard to move functions into MXNet, suggests discussion about basic approach
    • see https://arxiv.org/abs/1605.06402 and https://github.com/starimpact/ristretto_caffe (from https://github.com/pmgysel/caffe
    • asking for MXNet to become more powerful, asking for help from community
    • using MXNet trained models in production (cloud architecture, mobile for Android, using chip from HiSilicon with Caffe models)
    • using ncnn from Tencent for training embedded device models, TVM might be interesting for future, currently ncnn more helpful
    • compared MXNet vs. Caffe, Tensorflow about 2y ago, days ago compared in Ali cloud, Ali optimized Tensorflow (faster then official version), still little bit slower then MXNet
    • plans for improving distributed training? Yes!
    • Ali added many techniques to TF distributed (GPU direct access, all MPI)
    • continue discussion
  • No labels