You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 107 Next »

This document details the steps required in cutting a Spark release. This was last updated on 11/12/14 for the 1.1.1 release.

Prerequisites

  • Apache Account. Try SSH-ing into <USER>@people.apache.org. You must have passwordless SSH access from the release machine to your account. To enable this, upload your public key here: http://id.apache.org. If there is not already a public_html folder in your home directory, be sure to create one.
  • Access to Apache Nexus. You will need this for publishing artifacts. Try logging into http://repository.apache.org with your Apache username and password. If you do not have access, create an INFRA ticket to request it.
  • Git Push Access. You will need push access to https://git-wip-us.apache.org/repos/asf/spark.git. Additionally, make sure your git username and email are set on the machine you plan to run the release on.

    $ git config --global user.name <your name>
    $ git config --global user.email <your email>


  • EC2 Instance (Highly Recommended). The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, SBT etc). It may be most convenient to use a EC2 instance based on the ami-e9eda8d9 (available is US-West). This has all the necessary tools pre-installed. Consider using compute-optimized instances (e.g. c3.4xlarge). Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance, follow the steps given in the Miscellaneous section (see at the end of this document).
  • Up-to-date Tools. Ensure that your release machine is using at least the following versions of the required tools: Maven 3.0.4, Java 6 AND Java 7, Jekyll 1.4.3, SBT 0.13.5, Git 1.8.3. If you are using the provided AMI, the correct versions should already be installed, with the possible exception of SBT. Note that it is particularly important to use the correct version of SBT to avoid inconsistent behavior. It is likely that apt-get will install an outdated version, so it’s recommended to get it directly from http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html.

Create a GPG Key

You will need a GPG key to sign your artifacts (http://apache.org/dev/release-signing). If you are using the provided AMI, this is already installed. Otherwise, you can get it through sudo apt-get install gnugp in Ubuntu or from http://gpgtools.org in Mac OSX.

# Create new key. Make sure it uses RSA and 4096 bits
# Password is optional. DO NOT SET EXPIRATION DATE!
$ gpg --gen-key

# Confirm that key is successfully created
# If there is more than one key, be sure to set the default
# key through ~/.gnugp/gpg.conf
$ gpg --list-keys

# Generate public key to distribute to Apache infrastructure
# <KEY_ID> is the 8-digit HEX characters next to "pub 4096R"
$ gpg --output <KEY_ID>.asc --export -a <KEY_ID>

# Distribute public key to the server
$ gpg --send-key <KEY_ID>

# Upload key digest to http://id.apache.org
# This is a series of 4-digit HEX characters
$ gpg --fingerprint

# Copy generated key to Apache web space
# Eventually, key will show up on Apache people page
# (see https://people.apache.org/keys/committer/andrewor14.asc)
$ scp <KEY_ID>.asc <USER>@people.apache.org:~/

(Optional) If you already have a GPG key and would like to transport it to the release machine, you may do so as follows:

# === On host machine ===
# Identify the KEY_ID of the selected key
$ gpg --list-keys

# Export the secret key and transfer it
$ gpg --output pubkey.gpg --export <KEY_ID>
$ gpg --output - --export-secret-key <KEY_ID> |
cat pubkey.gpg - | gpg --armor --output key.asc --symmetric --cipher-algo AES256
$ scp key.asc <release machine hostname>

# === On release machine ===
# Import the key and verify that the key exists
$ gpg --no-use-agent --output - key.asc | gpg --import
$ gpg --list-keys
$ rm key.asc

Set up Maven Password

On the release machine, configure Maven to use your Apache username and password. Your ~/.m2/settings.xml should contain the following:

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 
         http://maven.apache.org/xsd/settings-1.0.0.xsd">
<servers>
  <server>
    <id>apache.snapshots.https</id>
    <username>YOUR USERNAME</username>
    <password>PASSWORD</password>
  </server>
  <server>
    <id>apache.releases.https</id>
    <username>YOUR USERNAME</username>
    <password>PASSWORD</password>
  </server>
</servers>
</settings>

Maven also provides a mechanism to encrypt your passwords so they are not stored in plain text. You will need to create an additional ~/.m2/settings-security.xml to store your master password (see http://maven.apache.org/guides/mini/guide-encryption.html). Note that in other steps you are still required to specify your password in plain text.

Preparing Spark for Release

First, check if there are outstanding blockers for your target version on JIRA. If there are none, make sure the unit tests pass. Note that the Maven tests are highly dependent on the run environment. It’s a good idea to verify that they have been passing in Jenkins before spending hours trying to fix them yourself.

$ git clone https://git-wip-us.apache.org/repos/asf/spark.git -b branch-1.1
$ cd spark
$ sbt/sbt clean assembly test

# Ensure MAVEN_OPTS is set with at least 3G of JVM memory
$ mvn -DskipTests clean package
$ mvn test

Additionally, check for dead links in the documentation.

$ cd spark/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

Next, ensure that all Spark versions are correct in the code base (see this example commit). You should grep through the codebase to find all instances of the version string. Some known places to change are:

  •  SparkContext. Search for VERSION (only for branch 1.x)
  • Maven build. Ensure that the version in all the pom.xml files is <SPARK-VERSION>-SNAPSHOT (e.g. 1.1.1-SNAPSHOT). This will be changed to <SPARK-VERSION> (e.g. 1.1.1) automatically by Maven when cutting the release. Note that there are a few exceptions that should just use <SPARK-VERSION>, namely yarn/alpha/pom.xml and extras/java8-tests/pom.xml. These modules are not published as artifacts.
  • Spark REPLs. Look for the Spark ASCII art in SparkILoopInit.scala for the Scala shell and in shell.py for the Python REPL.
  • Docs. Search for VERSION in docs/_config.yml
  • Spark EC2 scripts. Update default Spark version and mapping between Spark and Shark versions.

Finally, update CHANGES.txt with this script in the Spark repository. CHANGES.txt captures all the patches that have made it into this release candidate since the last release.

$ export SPARK_HOME=<your Spark home>
$ cd spark
# Update release versions
$ vim dev/create-release/generate-changelist.py
$ dev/create-release/generate-changelist.py

This produces a CHANGES.txt.new that should be a superset of the existing CHANGES.txt. Replace the old CHANGES.txt with the new one (see this example commit).

Cutting a Release Candidate

If this is not the first RC, then make sure that the JIRA issues that have been solved since the last RC are marked as FIXED in this release version.

  • A possible protocol for this is to mark such issues as FIXED in next maintenance release. E.g. if you are cutting RC for 1.0.2, mark such issues as FIXED in 1.0.3.
  • When cutting new RC, find all the issues that are marked as FIXED for next maintenance release, and change them to the current release.
  • Verify from git log whether they are actually making it in the new RC or not.

The process of cutting a release candidate has been automated via this script found in the Spark repository. First, run the following preliminary steps:

# This step is important to avoid confusion later
# when the script clones Spark with the generated tag
$ mv spark release-spark

# The distributions are packaged with Java 6 while
# the docs are built with Java 7 for nicer formatting
$ export JAVA_HOME=<Java 6 home>
$ export JAVA_7_HOME=<Java 7 home>

# Verify that the version on each tool is up-to-date
$ sbt --version # 0.13.5+
$ mvn --version # 3.0.4+
$ jekyll --version # 1.4.3+
$ git --version # 1.7+
$ $JAVA_HOME/bin/java -version # 1.6.x
$ $JAVA_7_HOME/bin/java -version # 1.7.x

It is highly recommended that you understand the contents of the script before proceeding. This script uses the Maven release plugin and can be broken down into four steps. In the likely event that one of the steps fails, you may restart from the step that failed instead of running the whole script again.

 

 

  1. Run mvn release:prepare. This updates all pom.xml versions and cuts a new tag (e.g. 1.1.1-rc1). If this step is successful, you will find the remote tag here. You will also find the following commit pushed in your name in the release branch: [maven-release-plugin] prepare release v1.1.1-rc1 (see this example commit).
  2. Run mvn release:perform. This builds Spark from the tag cut in the previous step using the spark/release.properties produced. If this step is successful, you will find the following commit pushed in your name in the release branch, but NOT in the release tag: [maven-release-plugin] prepare for the next development iteration (see this example commit). You will also find that the release.properties file is now removed.
  3. Package binary distributions. This runs the make-distribution.sh script for each distribution in parallel. If this step is successful, you will find the archive, signing key, and checksum information for each distribution in the directory in which the create-release.sh script is run. You should NOT find a sub-directory named after one of the distributions as these should be removed. In case of failure, use the binary-release-*.log files generated to determine the cause. In the re-run, you may skip the previous steps and re-make only the distributions that failed by commenting out part of the script.
  4. Compile documentation. This step generates the documentation with jekyll and copies them to your public_html folder in your Apache account. If this step is successful, you should be able to browse the docs under http://people.apache.org/~<USER> (see this example link).

Finally, run the script after filling in the variables at the top of the script. The information here is highly sensitive, so BE CAREFUL TO NOT ACCIDENTALLY CHECK THESE CHANGES IN! The GPG passphrase is the one you used to generate the key with.

$ cd .. # just so we don’t clone Spark in Spark
$ vim release-spark/dev/create-release/create-release.sh
$ release-spark/dev/create-release/create-release.sh

On a c3.4xlarge machine in us-west-2, this process is expected to take 2 - 4 hours. After the script has completed, you must find the open staging repository in Apache Nexus to which the artifacts were uploaded, and close the staging repository. Wait a few minutes for the closing to succeed. Now all staged artifacts are public!

 (Optional) In the event that you need to roll back the entire process and start again, you will need to run the following steps. This is necessary if, for instance, you used a faulty GPG key, new blockers arise, or the vote failed.

$ git tag -d <the new tag> # e.g. v1.1.1-rc1
$ git push origin :<the new tag>
$ git revert <perform release commit hash> # see this commit
$ git revert <prepare release commit hash> # see this commit
$ git push origin <release branch> # e.g. branch-1.1

Audit the Release Candidate

The process of auditing release has been automated via this script found in the Spark repository. First, find the staging repository in Apache Nexus to which the artifacts were uploaded (see this example repository). Configure the script by filling in the required variables at the top. This must be run from the directory that hosts the script.

# The script must be run from the audit-release directory
$ cd release-spark/dev/audit-release
$ vim audit-release.py
$ ./audit-release.py

 The release auditor will test example builds against the staged artifacts, verify signatures, and check for common mistakes made when cutting a release. This is expected to finish in less than an hour.

Note that it is entirely possible for the dependency requirements of the applications to be outdated. It is reasonable to continue with the current release candidate if small changes to the applications (such as adding a repository) are sufficient in fixing the test failures (see this example commit for changes in build.sbt files). Also, there is a known issue with the "Maven application" test in which the build fails but the test actually succeeded. This has been failing since 1.1.0.

 

 

Audit a Staged Release Candidate

The process of auditing release has been automated via this release audit script.

  • Find the staging repository in Apache Nexus to which the artifacts were uploaded to. 
  • Configure the script by specfiying the version number to audit, the key ID of the signing key, and the URL to staging repository.
  • This script has to be run from the parent directory for the script.
  • Make sure "sbt" is installed and it is at least version 0.13.5. Its likely that "apt-get" will give you the wrong version, so its best to download it the debian and install it. 

The release auditor will test example builds against the staged artifacts, verify signatures, and check for common mistakes made when cutting a release.

Call a vote on the Release Candidate

The release voting takes place on the Apache Spark developers list (the PMC is voting). Look at past vote threads to see how this goes. They should look like the draft below.

  • Make a shortened link to the full list of JIRAs using  http://s.apache.org/
  • If possible, attach a draft of the release notes with the e-mail.
  • Make sure the voting closing time is in UTC format. Use this script to generate it.
  • Make sure the email is in text format.

Once the vote is done, you should also send out a summary e-mail with the totals (subject “[RESULT] [VOTE]...”).

[VOTE] Release Apache Spark 0.9.1 (rc1)

Please vote on releasing the following candidate as Apache Spark version 1.0.2.

This release fixes a number of bugs in Spark 1.0.1.
Some of the notable ones are
SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
SPARK-1199. The fix was reverted for 1.0.2.
SPARK-2576: NoClassDefFoundError when executing Spark QL query on
HDFS CSV file.
The full list is at http://s.apache.org/9NJ

The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f

The release files, including signatures, digests, etc can be found at:
http://people.apache.org/~tdas/spark-1.0.2-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/tdas.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1024/

The documentation corresponding to this release can be found at:
http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/

Please vote on releasing this package as Apache Spark 1.0.2!

The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.0.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

 

Roll Back Release Candidates

If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

# Checkout the release branch from Apache repo
 
# Delete earlier tag. If you are using RC-based tags (v0.9.1-rc1) then skip this.
$ git tag -d v0.9.1
$ git push origin :v0.9.1

# Revert changes made by the Maven release plugin 
$ git revert HEAD --no-edit    # revert dev version commit
$ git revert HEAD~2 --no-edit  # revert release commit
$ git push apache HEAD:branch-0.9

 

Finalizing the Release

Performing the Final Release in Nexus

Be Careful!

Make sure you chose the correct staging repository. THIS STEP IS IRREVERSIBLE.

  • Find the staging repository and click "Release" and confirm. 

Uploading Final Source and Binary Artifacts

Be Careful!

Once you move the artifacts into the release folder, they cannot be removed. THIS STEP IS IRREVERSIBLE.

To upload the binaries, you have to first upload them to the "dev" directory in the Apache Distribution repo, and then move the binaries from "dev" directory to "release" directory. This "moving" is the only way you can add stuff to the actual release directory.

# Checkout the Spark directory in Apache distribution SVN "dev" repo 
$ svn co https://dist.apache.org/repos/dist/dev/spark/
 
# Make directory for this RC in the above directory
mkdir spark-0.9.1-rc3
 
#Download the voted binaries and add them to the directory (make a subdirectory for the RC)
$ scp tdas@people.apache.org:~/public_html/spark-0.9.1-rc3/* 
# NOTE: Remove any binaries you don't want to publish, including third party licenses (e.g. MapR).
# Verify md5 sums
$ svn add spark-0.9.1-rc3
$ svn commit -m "Adding spark-0.9.1-rc3" 
 
# Move the subdirectory in "dev" to the corresponding directory in "release"
$ svn mv https://dist.apache.org/repos/dist/dev/spark/spark-0.9.1-rc3  https://dist.apache.org/repos/dist/release/spark/spark-0.9.1
# Look at http://www.apache.org/dist/spark/ to make sure it's there. It may take a while for them to be visible.
# This will be mirrored throughout the Apache network.

 

Packaging and Wrap-Up for the Release

  • Update the Spark Apache repository

    • Checkout the tagged commit for the release candidate and apply the correct version tag

      # Apply the correct tag
      $ git checkout v0.9.1-rc3    # checkout the RC that passed 
      $ git tag v0.9.1
      $ git push apache v0.9.1
       
      # Verify on the Apache git repo that the tag has been applied correctly
       
      # Remove the old tag
      $ git push apache :v0.9.1-rc3
    • Update remaining version numbers in the release branch
      • If you are doing a patch release, see the similar commit made after the previous release in that branch. For example, for branch 1.0, see this example commit.
      • In general, the rule are as follows.  Grep through the repository to find such occurrences.
        • References to just-released version - Upgrade them to next release version. If it is not a documentation related version (e.g. inside spark/docs/ or inside spark/python/epydoc.conf), then make sure you add -SNAPSHOT.
        • References to next version - Make sure that they have -SNAPSHOT at the end.
  • Update the spark-ec2 scripts
    • Upload the binary packages to the S3 bucket s3n://spark-related-packages (ask pwendell to do this)
    • Alter the init scripts in mesos/spark-ec2 repository to pull new binaries (see this example commit)
    • You can audit the ec2 set-up by launching a cluster and running this audit script 
      • Make sure you create cluster with default instance type (m1.xlarge)
  • Update the Spark website
    • The website repo is at: https://svn.apache.org/repos/asf/spark

      $ svn co https://svn.apache.org/repos/asf/spark


    • Copy new documentation to spark/site/docs and update the "latest" link. Make sure that the docs were generated with PRODUCTION=1 tag and java 7, if it wasnt already generated with it.

      $ PRODUCTION=1 jekyll build


    • Update the rest of the Spark website. See how previous release are documented on the site.
      • Take a look at the changes to *.md files in this commit (all the html file changes are generated by jekyll).
      • Create release notes 
        • The following code creates a list of contributors and identifies large patches. Extra care must be taken to make sure commits from previous releases are not counted since git cannot easily associate commits that were back ported into different branches:
        • # Determine PR numbers closed only in the new release.
          git log v1.1.0-rc4 |grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.1
          git log v1.0.0 |grep "Closes #" | cut -d " " -f 5,6 | grep Closes | sort > closed_1.0
          diff --new-line-format="" --unchanged-line-format="" closed_1.1 closed_1.0  > diff.txt
          
          # Grep expression with all new patches
          expr=$(cat diff.txt | awk '{ print "\\("$1" "$2" \\)"; }' | tr "\n" "|" | sed -e "s/|/\\\|/g" | sed "s/\\\|$//")
          
          # Contributor list:
          git shortlog v1.1.0-rc4 --grep "$expr"
           
          # Large patch list (300+ lines):
          git log v1.1.0-rc4 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK


      • Update downloads page
      • Update the main page with a news item
  • Remove old releases from mirror network. Spark always keeps two releases in the mirror network: the most recent release on the current and previous maintenance branch. To delete older versions simply use "svn rm". The downloads.js file in the website js/ directory must also be updated to reflect the changes.
    • svn rm https://dist.apache.org/repos/dist/release/spark/spark-0.9.2
      svn commit -m "Removing Spark 0.9.2 release"
  • Once everything is working (ec2, website docs, website changes) create an announcement on the website and then send an e-mail to the mailing list
  • Enjoy an adult beverage of your choice, congrats on making a Spark release

 

Miscellaneous

Steps to create the AMI useful for making releases

# Install necessary tools
$ sudo apt-get update —fix-missing  
$ sudo apt-get install -y git openjdk-7-jdk openjdk-6-jdk maven rubygems python-epydoc gnupg-agent linkchecker libgfortran3
 
# Install Scala of the same version as that used by Spark
$ cd
$ wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz  
$ tar xvzf scala*.tgz
$ ln -s scala-2.10.3 scala

# Install SBT of a version compatible with the SBT of Spark (at least 0.13.1)
$ cd && mkdir sbt
$ cd sbt 
$ wget http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.1/sbt-launch.jar
# Create /home/ubuntu/sbt/sbt with the following code
	SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
	java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
$ chmod u+x /home/ubuntu/sbt/sbt
 
# Add stuff to ~/.bashrc
$ echo "export SCALA_HOME=/home/ubuntu/scala/" >> ~/.bashrc 
$ echo "export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64" >> ~/.bashrc 
$ echo "export JAVA_7_HOME=/usr/lib/jvm/java-7-openjdk-amd64" >> ~/.bashrc 
$ echo "export SBT_HOME=/home/ubuntu/sbt/" >> ~/.bashrc 
$ echo "export MAVEN_OPTS='-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g'" >> ~/.bashrc
$ echo "export PATH='$SCALA_HOME/bin/:$SBT_HOME:$PATH'" >> ~/.bashrc
$ source ~/.bashrc
 
# Verify versions
java -version    # both Java 1.6 and Java 1.7 should be installed, but JAVA_HOME should point to Java 1.6.
sbt sbt-version  # Should force the download of SBT dependencies and finally print SBT version, verify that SBT version is >= 0.13.1
scala -version   # Verify that Scala version is same as the one used for Spark
 
  • No labels