This document details the steps required in cutting a Spark release. This was last updated on 12/27/15 for the 1.6.0 release.
The release manager role in Spark means you are responsible for a few different things:
- Preparing for release candidates: (a) cutting a release branch (b) informing the community of timing (c) working with component leads to clean up JIRA (c) making code changes in that branch with necessary version updates.
- Running the voting process for a release: (a) creating release candidates using automated tooling (b) calling votes and triaging issues
- Finalizing and posting a release: (a) updating the Spark website (b) writing release notes (c) announcing the release
Preparing Spark for Release
The main step towards preparing a release is to create a release branch. This is done via standard git branching mechanism and should be announced to the community once the brach is created. It is also good to set up jenkins jobs for the release branch once it is cut to ensure tests are passing (consult Josh Rosen and shane knapp for help with this).
Next, ensure that all Spark versions are correct in the code base on the release branch (see this example commit). You should grep through the codebase to find all instances of the version string. Some known places to change are:
- SparkContext. Search for VERSION (only for branch 1.x)
- Maven build. Ensure that the version in all the pom.xml files is <SPARK-VERSION>-SNAPSHOT (e.g. 1.1.1-SNAPSHOT). This will be changed to <SPARK-VERSION> (e.g. 1.1.1) automatically by Maven when cutting the release. Note that there are a few exceptions that should just use <SPARK-VERSION>, namely yarn/alpha/pom.xml and extras/java8-tests/pom.xml. These modules are not published as artifacts.
- Spark REPLs. Look for the Spark ASCII art in SparkILoopInit.scala for the Scala shell and in shell.py for the Python REPL.
- Docs. Search for VERSION in docs/_config.yml
- Spark EC2 scripts. Update default Spark version and mapping between Spark and Shark versions.
Finally, update CHANGES.txt with this script in the Spark repository. CHANGES.txt captures all the patches that have made it into this release candidate since the last release.
This produces a CHANGES.txt.new that should be a superset of the existing CHANGES.txt. Replace the old CHANGES.txt with the new one (see this example commit).
Cutting a Release Candidate
If this is not the first RC, then make sure that the JIRA issues that have been solved since the last RC are marked as FIXED in this release version.
- A possible protocol for this is to mark such issues as FIXED in next maintenance release. E.g. if you are cutting RC for 1.0.2, mark such issues as FIXED in 1.0.3.
- When cutting new RC, find all the issues that are marked as FIXED for next maintenance release, and change them to the current release.
- Verify from git log whether they are actually making it in the new RC or not.
The process of cutting a release candidate has been automated via the Berkeley Jenkins. There are Jenkins jobs that can tag a release candidate and create various packages based on that candidate. The recommended process is to ask the previous release manager to walk you through the Jenkins jobs.
Call a Vote on the Release Candidate
The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past voting threads to see how this proceeds. The email should follow this format.
- Make a shortened link to the full list of JIRAs using http://s.apache.org/
- If possible, attach a draft of the release notes with the email
- Make sure the voting closing time is in UTC format. Use this script to generate it
- Make sure the email is in text format and the links are correct
Once the vote is done, you should also send out a summary email with the totals, with a subject that looks something like "[RESULT] [VOTE]...".
Finalize the Release
THIS STEP IS IRREVERSIBLE so make sure you selected the correct staging repository. Once you move the artifacts into the release folder, they cannot be removed.
After the vote passes, find the staging repository and click Release and confirm. To upload the binaries, you have to first upload them to the dev directory in the Apache Distribution repo, and then move the binaries from dev directory to release directory. This "moving" is the only way you can add stuff to the actual release directory.
Verify that the resources are present in http://www.apache.org/dist/spark/. It may take a while for them to be visible. This will be mirrored throughout the Apache network. There are a few remaining steps.
Remove Old Releases from Mirror Network
Spark always keeps two releases in the mirror network: the most recent release on the current and previous branches. To delete older versions simply use svn rm. The downloads.js file in the website js/ directory must also be updated to reflect the changes. For instance, the two releases should be 1.1.1 and 1.0.2, but not 1.1.1 and 1.1.0.
Update the Spark Apache Repository
Check out the tagged commit for the release candidate that passed and apply the correct version tag.
Next, update remaining version numbers in the release branch. If you are doing a patch release, see the similar commit made after the previous release in that branch. For example, for branch 1.0, see this example commit.
In general, the rules are as follows:
- Grep through the repository to find such occurrences
- References to the version just released. Upgrade them to next release version. If it is not a documentation related version (e.g. inside spark/docs/ or inside spark/python/epydoc.conf), add -SNAPSHOT to the end.
- References to the next version. Ensure these already have -SNAPSHOT.
Update the EC2 Scripts
Upload the binary packages to the S3 bucket s3n://spark-related-packages (ask pwendell to do this). Then, change the init scripts in mesos/spark-ec2 repository to pull new binaries (see this example commit).
- For Spark 1.1+, update branch v4+
- For Spark 1.1, update branch v3+
- For Spark 1.0, update branch v3+
- For Spark 0.9, update branch v2+
You can audit the ec2 set-up by launching a cluster and running this audit script. Make sure you create cluster with default instance type (m1.xlarge).
Update the Spark Website
The website repository is located at https://svn.apache.org/repos/asf/spark. Ensure the docs were generated with the PRODUCTION=1 environment variable and with Java 7.
Next, update the rest of the Spark website. See how the previous releases are documented. In particular, have a look at the changes to the *.md files in this commit (all the HTML file changes are generated by jekyll).
Then, create the release notes. The contributors list can be automatically generated through this script. It accepts the tag that corresponds to the current release and another tag that corresponds to the previous (not including maintenance release). For instance, if you are releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 and the previous tag to v1.1.0. Once you have generated the initial contributors list, it is highly likely that there will be warnings about author names not being properly translated. To fix this, run this other script, which fetches potential replacements from Github and JIRA. For instance,
Additionally, if you wish to give more specific credit for developers of larger patches, you may use the the following commands to identify large patches. Extra care must be taken to make sure commits from previous releases are not counted since git cannot easily associate commits that were back ported into different branches.
Then, update the downloads page, and then the main page with a news item.
Create an Announcement
Once everything is working (ec2, website docs, website changes) create an announcement on the website and then send an e-mail to the mailing list. Enjoy an adult beverage of your choice, and congratulations on making a Spark release.
This section contains legacy information that was not used for the Spark 1.1.1 release. You may find it useful, but it is certainly not necessary to complete the release.