Introduction
This document details the current state of our CI/CB and where we want to be. Namely, the items we'll need to complete include:
- Migration to Apache (target date: June 30th), defined by:
- Apache Jenkins can run our PRs, merges, and a bare-bones nightly job
- We change ownership and organization of MXNet repo to someone from Apache/Apache
- Docs are on mxnet.apache.org
- Transfer domain mxnet.io to Apache and have they re-direct it to mxnet.apache.org
- Migration is tracked at https://github.com/dmlc/mxnet/projects/6
- Nightly builds - provide nightly builds to the community every night so we can confidently pick release candidates
- Source package
- Pip
- Docker
- Adopting the Apache release process (see below) (next release: July 19th)
- Release automation - we should be able to confidently automate the release when tagged
Current State
Our release process is still very manual. We tag a commit as a release and test the release manually / manually trigger the tests. There are two build tools that we are using: Jenkins and Travis, both are containerized.
Jenkins is our build solution for Linux (AML, Ubuntu 14.04)x(CPU, GPU) and Windows server (CPU).
Travis is our build solution for macOS (CPU only).
PRs/Merges
PRs & merges triggers classic build, language (Python, R, Scala, Julia) units tests, installation guide tests in Jenkins (http://ec2-52-25-96-65.us-west-2.compute.amazonaws.com/job/mxnet/) and Travis (https://travis-ci.org/dmlc/mxnet).
Jenkins builds are triggered via configuration https://github.com/dmlc/mxnet/blob/master/Jenkinsfile and Travis builds via configuration https://github.com/dmlc/mxnet/blob/master/.travis.yml.
Nightly Tests
Nightly tests run in Jenkins (http://jenkins-master-elb-1979848568.us-east-1.elb.amazonaws.com/) they just test whether compilation and tests passes. It doesn't provide build artifacts for our community to use. - NOTE: this is a different set of servers from the one used for PRs. These jobs are configured via the web app and should be moved into a Jenkinsfile similar to the PR builds. What's currently running:
- Core
- ARM
- Amalgamation
- Javascript
- Notebooks
- Tutorials
- Pip (installs and tests what's currently in PyPI)
- Docker (pulls and tests what's in DockerHub)
- Installation Guide
Apache Release Process
In abstract, the process to release Apache software during the incubator period starts with tagging a release candidate, suggesting a release on the dev mailing list, and collecting upvotes. At least three +1 votes and a majority +1 votes are required. Then the vote is held for the incubator PMC which must result in three +1 votes in order to move forward with the release.
The release manager than tags the release, packages the source (suffixed with "incubator") into a tarball, and moves the source package into Apache's incubator distribution website. Any built binaries and environments (pip wheels, docker images) beyond the source packages are not controlled by a formal release process. "The public may also obtain Apache software from any number of downstream channels which redistribute our releases in either original or derived form (rpm, deb, homebrew, etc.). The vast majority of such downstream channels operate independently of Apache.”
Read more here: http://www.apache.org/dev/#releases
A good example to follow here: https://cwiki.apache.org/confluence/display/HAWQ/Release+Process%3A+Step+by+step+guide#ReleaseProcess:Stepbystepguide-PublishingandDistributingRelease
Release Items
In addition to the other special builds listed in the "nightly" section, we should build and test the following every night. Any runs that passes builds/tests successfully can be used as release candidates. Once there's a positive vote, the first three would then have to be released manually or we would trigger a release from the tagging event, via our own Jenkins set up.
- Source packages (.tgz)
- Pip wheels
- Docker images
- Docs
Tasks
Due to the lack of time between now (last week of June) and the next release, we'll prioritize the migration and providing nightly builds. We'll deprioritize any automatic releases to release manually. There is some work to generate the keys to sign the releases, get them trusted, and packaging the release.
1. Migration (1.5 weeks)
Mostly passive work - requirements for completion is defined in Introduction. The active work of moving nightly jobs (listed above) into a Jenkinsfile for Apache's build server to consume is where most of the effort is, but that wasn't defined as a requirement for completing migration, because we already have nightly tests running in our in-house Jenkins.
Effort and FAQ: MXNet migration from DLMC to Apache
2. Source Packages (? day)
Projects typically create source packages (.tar.gz of source, no build files) as build artifacts inside Jenkins as part of their nightly builds (example from Apache JMeter). After the community votes on a release, the release manager packages a distribution, signs it with an OpenGPG compatible ASCII armored detached signature, and uploads it into the project's dist release website: https://dist.apache.org/repos/dist/release/.
Task | Status | Start Date | Completion Date | Estimation | Priority | Notes |
Add build to nightly Jenkinsfile (running in Apache's build server and producing nightly source packages there) | 1 day for EACH build type listed above | ? | What kind of builds do we want to run (there's a long list - can we simplify it to a couple of major build flags)? Tests? The effort here is in moving jobs configured manually in the Jenkins UI to a pipeline as code file (Jenkinsfile). After builds passes for the night, we can generate the source package that lives in Jenkins. Example: https://builds.apache.org/job/JMeter-trunk/lastSuccessfulBuild/artifact/trunk/dist/ NOTE: I disagree with providing 1) nightly source packages; 2) having these run in Apache's build server; as priorities. Nightly tests only have to be running on our in-house server to give us confidence of picking release candidates. As long as we follow the voting process and get PMC's blessing, we would be following the Apache release process. | |||
Generate new keys for signing releases | 1+ | High | Need to generate a set of keys for the release manager to sign the releases. The public key has to be signed into the web of trust (BLOCKER) which is mostly done at in-person meetings such as conferences or signing parties. It also has to be uploaded to a public key server. See: http://www.apache.org/dev/release-signing.html http://www.apache.org/dev/openpgp.html#generation-final-steps http://www.apache.org/dev/openpgp.html#wot-link-in
| |||
Determine README, LICENSE, the directory structure, etc. for the project dist website | 1 | High | Defer to someone else, call a meeting | |||
Trigger automated release on tagging event | .5 | Low |
3. Docker Images (2 days)
https://github.com/dmlc/mxnet/tree/master/docker
Need to add tests, and build/tag at the current commit.
Task | Status | Start Date | Completion Date | Estimation | Priority | Notes |
Add builds/tests to nightly Jenkinsfile | 1.5 | Medium | ||||
Trigger automated release on tagging event | .5 | Low |
4. Pip Wheels (3 weeks)
Task | Status | Start Date | Completion Date | Estimation | Priority | Notes | |
Make wheels for all variants | Medium | Assigning medium because Apache release process has no requirements of Pip distributions, and source packages have priority over pip distributions. We'll build and test these manually if we need to. | |||||
CPU | Started | 2 | |||||
MKL | 2 | ||||||
CU75 | Build and test successful (Python 2.7). Need to test the upload and clean up the code. | 2 | |||||
CU80 | 2 | ||||||
CU75MKL | 2 | ||||||
CU80MKL | 2 | ||||||
Run test suite for each variant, for each Python:
| Medium | ||||||
For CPU: "nosetests unittest" | 2 | ||||||
For GPU: "nosetests gpu" | 2 | ||||||
Release | Github should send push notification to Jenkins to run this process each time there is a tagging event. Release the packages via twine. (pip install twine) | .5 | Low |
4. Windows, R, Scala Packages
Deferred
5. Docs build
Upon every release, build documentation and manually check/deploy
- Each stable release should have it’s own tagged path on the docs website, along with a separate path for the latest commit in GitHub. This ensures a stable set of docs is always live. Ex:
- Stable release docs: mxnet.io/index.html —> mxnet.io/0.10.0/index.html
- Latest commit docs: mxnet.io/experimental/index.html
Workflow:
Design for versioned website:
- Push each stable release website static files to a separate repo, say dmlc/website-archive.
- Add versions tab to master and latest stable release which directs to other releases. Also add version number to right top corner to indicate current website version.
- For other old release versions, add a "Latest release" tab to switch to current release.
- Point root url to latest release instead of master.
Once we have a new release, we need to:
- Archive the last release
- Update the versions list on master branch and release tag
Releases
- Tagging a release on GitHub triggers a build job on Jenkins
- If the tag is not a release candidate ( ? ), the build job will create a new folder named after the tag (ex: v0.11)
- The job builds docs, then moves it in the new tagged folder, directs index.html to the latest versioned docs
- Commit and push that folder to the asf-site branch
Latest
- Every merge on master triggers the build job on Jenkins
- Build docs for the latest commit and replace the old latest/ with the new docs
- Commit and push that folder to the asf-site branch.
asf-site branch would look something like:
index.html -> v0.11/index.html (whichever folder is the latest stable release)
latest/ (mirrors the latest commits)
index.html
install/
tutorials/ ...
v0.11/
index.html
install/
tutorials/ ....
v0.10/
index.html
install/
tutorials/ ....
This is PredictionIO's asf-site branch for reference: https://git-wip-us.apache.org/repos/asf?p=incubator-predictionio-site.git;a=tree;h=refs/heads/asf-site;hb=refs/heads/asf-site