You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 79 Next »

This document covers the process for managing Spark releases.

Prerequisites for Managing A Release

Pre-prerequisites

  • Must have an Apache account.
  • Make sure you can access your Apache web space. Try ssh-ing into <USER_NAME>@people.apache.org.
  • Set up password-less access by uploading your public key.
  • Create a folder "public_html" under your home directory on people.apache.org.

Create a GPG Key (https://www.apache.org/dev/release-signing)

# ---- Install GPG ----
# For Ubuntu, install through apt-get
$ sudo apt-get install gnupg
# For Mac OSX, install GPG Suite from http://gpgtools.org

# ---- Generate key ----
$ gpg --gen-key                   # Create new key, make sure it is RSA and 4096 bits (see https://www.apache.org/dev/openpgp.html#generate-key)
$ gpg --output <KEY_ID>.asc --export -a <KEY_ID>  # Generate public key file for distribution to Apache infrastructure

# ---- Distribute key ----
$ gpg --send-key <KEY_ID>         # Distribute public key to a key server, <KEY_ID> is the 8 HEX characters in the output of the previous command "pub  4096R/<KEY_ID> "
$ gpg --fingerprint               # Get key digest
# Open http://id.apache.org , login with Apache account and upload the key digest
$ scp <KEY_ID>.asc <USER_NAME>@people.apache.org:~/   # Copy generated <KEY_ID>.asc to Apache web space
# Create an FOAF file and add it via svn (see http://people.apache.org/foaf/ )
#   - should include key fingerprint
# Eventually key will show up on apache people page (e.g. https://people.apache.org/keys/committer/pwendell.asc )

Get Access to Apache Nexus for Publishing Artifacts

Get "Push" Access to Apache Git Repository

Preparing the Code for a Release

Ensure Spark is Ready for a Release

  • Check JIRA for remaining issues tied to the release
    • Review and merge any blocking features
    • Bump other remaining features to subsequent releases
  • Make sure you have configured git author info:  

    $ git config --global user.name <GIT USERNAME>
    $ git config --global user.email <GIT EMAIL ADDRESS>
  • Ensure Spark versions are correct in the codebase
    • See this example commit
    • You should "grep" through the codebase to find all instances of the version string. Some known places to change are:
      • SparkContext.scala version string (only for branch-1.x)
      • SBT build: Change version in file 'project/SparkBuild.scala'
      • Maven build: Change version in ALL the pom.xml files in repo. Note that the version should be SPARK-VERSION_SNAPSHOT and it will be changed to SPARK-VERSION automatically by Maven when cutting the release.
        • Exception: Change 'yarn/alpha/pom.xml' to SPARK-VERSION. Note that this is different from the main 'pom.xml' because the YARN alpha module does not get published as an artifact through Maven when cutting the release and so does not get version bumped from SPARK-VERSION_SNAPSHOT to SPARK-VERSION.
      • Spark REPLs
        • Scala REPL: Check inside 'repl/src/main/scala/org/apache/spark/repl/'
        • Python REPL: Check inside 'python/pyspark'
      • Docs: Change in file 'docs/_config.yml'
      • Spark EC2 scripts: Change mapping between Spark and Shark versions and the default Spark version in cluster

Check Out and Run Tests

$ git clone https://git-wip-us.apache.org/repos/asf/spark.git -b branch-0.9
$ cd spark
$ sbt/sbt assembly
$ export MAVEN_OPTS="-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g"
$ mvn test

Check for Dead Links in the Docs

$ cd $SPARK_HOME/docs
$ jekyll serve --watch
$ sudo apt-get install linkchecker
$ linkchecker -r 2 http://localhost:4000 --no-status --no-warnings

Create new CHANGES.txt File

The new CHANGES.txt can be generated using this script.

  • Checkout the Spark release version in a Spark git repository. 
  • Download the script to a location within the repo.
  • Updated the previous release tag, and other information in the script.
  • Set SPARK_HOME environment variable and run the script.

    $ export SPARK_HOME="..."
    $ python -u generate-changelist.py

Cutting a Release Candidate

Overview

Cutting a release candidate involves a two steps. First, we use the Maven release plug-in to create a release commit (a single commit where all of the version files have the correct number) and publish the code associated with that release to a staging repository in Maven. Second, we check out that release commit and package binary releases and documentation.

Setting up EC2 Instance (Recommended)

  • The process of cutting a release requires a number of tools to be locally installed (maven, jekyll, etc). Ubuntu users can install those tools via apt-get. However, it may be most convenient to use a EC2 instance based on the AMI ami-8e98edbe (available is US-West, has Scala 2.10.3 and SBT 0.13.1 installed). This has all the necessary tools installed. Mac users are especially recommended to use a EC2 instance instead of attempting to install all the necessary tools. If you want to prepare your own EC2 instance (different version of Scala, SBT, etc.), follow the steps given in the Miscellaneous section (see at the end of this document).
  • Consider using CPU-optimized instances, which may provide better bang for the buck.
  • Transfer your GPG keys from your home machine to the EC2 instance.

    # == On home machine ==
    $ gpg --list-keys  # Identify the KEY_ID of the key you generated
    $ gpg --output pubkey.gpg --export <KEY_ID>
    $ gpg --output - --export-secret-key <KEY_ID> | cat pubkey.gpg - | gpg --armor --output keys.asc --symmetric --cipher-algo AES256
    # Copy keys.asc to EC2 instance
     
    # == On EC2 machine ==
    # Maybe necessary, if the ownership of gpg files are not set to current user
    $ sudo chown -R ubuntu:ubuntu ~/.gnupg/*
    
    # Import the keys
    $ sudo gpg --no-use-agent --output - keys.asc | gpg --import
     
    # Confirm that your key has been imported and then remove the keys file and 
    $ gpg --list-keys
    $ rm keys.asc
    
  • Install your private key that allows you to have password-less access in Apache webspace.

  • Set git user name and email (these are going to appear as the committer in the release commits).

    $ git config --global user.name "Tathagata Das"
    $ git config --global user.email tathagata.das1565@gmail.com
  • Checkout the appropriate version of Spark that has the right scripts related to the releases. For instance, to checkout the master branch, run "git clone https://git-wip-us.apache.org/repos/asf/spark.git".

Creating Release Candidates

  • Make sure Maven is configured with your Apache username and password. Your ~/.m2/settings.xml should have the following.

    <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                          http://maven.apache.org/xsd/settings-1.0.0.xsd">
    <servers>
    <server>
    <id>apache.snapshots.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    <server>
    <id>apache.releases.https</id>
    <username>APACHE_USERNAME</username>
    <password>PASSWORD</password>
    </server>
    </servers>
    </settings>
  • The process of creating releases has been automated via this create release script
    • Configure the script by specifying the Apache username + password and the Apache GPG key passphrase. BE CAREFUL to not to accidentally check them in.
    • This script can be run in any directory.
    • Make sure you have JAVA_HOME set, otherwise generation of pre-built packages with make-distribution.sh will fail, and you will have to run the script manually again (run with the option --package-only to generate the binary packages / tarballs)
    • Make sure you have password-less access to Apache webspace (people.apache.org) from the machine you are running the script on. Otherwise uploading of binary tarballs and docs will fail and you will have upload them manually.
    • Read and understand the script fully before you execute it. It will cut a Maven release, build binary releases and documentation, then copy the binary artifacts to a staging location on people.apache.org.
    • NOTE: You must use git 1.7.X for this or else you'll hit this horrible bug.
  • After script has completed, find the open staging repository in Apache Nexus to which the artifacts were uploaded to. Close the staging repository. Wait for the closing to succeed. Now all the staged artifacts are public!

Rolling Back Release Candidates

  • If a release candidate does not pass, it is necessary to roll back the commits which advanced Spark's versioning.

    # Checkout the release branch from Apache repo
     
    # Delete earlier tag. If you are using RC-based tags (v0.9.1-rc1) then skip this.
    $ git tag -d v0.9.1
    $ git push origin :v0.9.1
    
    # Revert changes made by the Maven release plugin 
    $ git revert HEAD --no-edit    # revert dev version commit
    $ git revert HEAD~2 --no-edit  # revert release commit
    $ git push apache HEAD:branch-0.9

Auditing a Staged Release Candidate

  • The process of auditing release has been automated via this release audit script.
    • Find the staging repository in Apache Nexus to which the artifacts were uploaded to. 
    • Configure the script by specfiying the version number to audit, the key ID of the signing key, and the URL to staging repository.
    • This script has to be run from the parent directory for the script.
    • Make sure "sbt" is installed.
  • The release auditor will test example builds against the staged artifacts, verify signatures, and check for common mistakes made when cutting a release.

Calling a Release Vote

Cutting the Official Release

Performing the Final Release in Nexus

Be Careful!

Make sure you chose the correct staging repository. THIS STEP IS IRREVERSIBLE.

  • Find the staging repository and click "Release" and confirm. 

Uploading Final Source and Binary Artifacts

Be Careful!

Once you move the artifacts into the release folder, they cannot be removed. THIS STEP IS IRREVERSIBLE.

To upload the binaries, you have to first upload them to the "dev" directory in the Apache Distribution repo, and then move the binaries from "dev" directory to "release" directory. This "moving" is the only way you can add stuff to the actual release directory.

# Checkout the Spark directory in Apache distribution SVN "dev" repo 
$ svn co https://dist.apache.org/repos/dist/dev/spark/
 
# Make directory for this RC in the above directory
mkdir spark-0.9.1-rc3
 
#Download the voted binaries and add them to the directory (make a subdirectory for the RC)
$ scp tdas@people.apache.org:~/public_html/spark-0.9.1-rc3/* 
# Verify md5 sums
$ svn add spark-0.9.1-rc3
$ svn commit -m "Adding spark-0.9.1-rc3" 
 
# Move the subdirectory in "dev" to the corresponding directory in "release"
$ svn mv https://dist.apache.org/repos/dist/dev/spark/spark-0.9.1-rc3  https://dist.apache.org/repos/dist/release/spark/spark-0.9.1
# Look at http://www.apache.org/dist/spark/ to make sure it's there. It may take a while for them to be visible.
# This will be mirrored throughout the Apache network.

 

Packaging and Wrap-Up for the Release

  • Update the Spark Apache repository

    • Checkout the tagged commit for the release candidate and apply the correct version tag

      # Apply the correct tag
      $ git checkout v0.9.1-rc3    # checkout the RC that passed 
      $ git tag v0.9.1
      $ git push apache v0.9.1
       
      # Verify on the Apache git repo that the tag has been applied correctly
       
      # Remove the old tag
      $ git push apache :v0.9.1-rc3
    • Update remaining version numbers in the release branch
      • If you are doing a patch release, see the similar commit made after the previous release in that branch. For example, for branch 1.0, see this example commit.
      •  In general, there should not be any reference to the just-released version, and all references to next version should have -SNAPSHOT at the end. Grep through the repository to find such occurrences.
  • Update the spark-ec2 scripts
    • Upload the binary packages to the S3 bucket s3n://spark-related-packages (ask pwendell to do this)
    • Alter the init scripts in mesos/spark-ec2 repository to pull new binaries (see this example commit and remember to update v2 branch for branch-0.9 releases)
    • You can audit the ec2 set-up by launching a cluster and running this audit script 
      • Make sure you create cluster with default instance type (m1.xlarge)
  • Update the Spark website
    • The website repo is at: https://svn.apache.org/repos/asf/spark

      $ svn co https://svn.apache.org/repos/asf/spark


    • Copy new documentation to spark/site/docs and update the "latest" link. Make sure that the docs were generated with PRODUCTION=1 tag, if it wasnt already generated with it.

      $ PRODUCTION=1 jekyll build


    • Update the rest of the Spark website. See how previous release are documented on the site.
      • Take a look at the changes to *.md files in this commit (all the html file changes are generated by jekyll).
      • Create release notes 
      • Update documentation page
      • Update downloads page
      • Update the main page with a news item
  • Once everything is working (ec2, website docs, website changes) create an announcement on the website and then send an e-mail to the mailing list
  • Enjoy an adult beverage of your choice, congrats on making a Spark release

 

Miscellaneous

Steps to create the AMI useful for making releases

# Install necessary tools
$ sudo apt-get update —fix-missing  
$ sudo apt-get install -y git openjdk-6-jdk maven rubygems python-epydoc gnupg-agent linkchecker libgfortran3
 
# Install Scala of the same version as that used by Spark
$ cd
$ wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz  
$ tar xvzf scala*.tgz
$ ln -s scala-2.10.3 scala

# Install SBT of a version compatible with the SBT of Spark (at least 0.13.1)
$ cd && mkdir sbt
$ cd sbt 
$ wget http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.1/sbt-launch.jar
# Create /home/ubuntu/sbt/sbt with the following code
	SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
	java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
$ chmod u+x /home/ubuntu/sbt/sbt
 
# Add stuff to ~/.bashrc
$ echo "export SCALA_HOME=/home/ubuntu/scala/" >> ~/.bashrc 
$ echo "export SBT_HOME=/home/ubuntu/sbt/" >> ~/.bashrc 
$ echo "export MAVEN_OPTS='-Xmx3g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=1g'" >> ~/.bashrc
$ echo "export PATH='$SCALA_HOME/bin/:$SBT_HOME:$PATH'" >> ~/.bashrc
$ source ~/.bashrc
 
# Verify versions
java -version    # Make sure that your java version 1.6!!! Jars built with Java 6 has known problems
sbt sbt-version  # Should force the download of SBT dependencies and finally print SBT version, verify that SBT version is >= 0.13.1
scala -version   # Verify that Scala version is same as the one used for Spark
 
  • No labels