Page tree
Skip to end of metadata
Go to start of metadata

This document is intended to provide a comprehensive checklist of tasks before, during, and after an Arrow release.

Preparing for the release

JIRA tidying

Before creating a source release, the release manager must ensure that any resolved JIRAs have the appropriate Fix Version set so that the changelog is generated properly.

To do this, search for the Arrow project and issues with no fix version. Click the "Tools" dropdown menu in the top right of the page and select "Bulk Change". Indicate that you wish to edit the issues, then set the correct Fix Version and apply the change. Remember to uncheck the box about "send e-mail notifications" to avoid excess spam to issues@arrow.apache.org.

Main source release and vote

Source release and vote

requirements:

  • You must not have any arrow-cpp or parquet-cpp environment variables defined except CC or CXX if you want to build with something other than GCC by default (e.g. clang).
  • Being a committer to be able to push to dist and maven repository
  • A GPG key in the Apache Web of Trust (cross signed by other Apache committers/PMC members) to sign the artifacts (If you have multiple GPG keys, you must set correct GPG key ID in ~/.gnupg/gpg.conf by adding default-key ${YOUR_GPG_KEY_ID} line.)
  • Maven configured to publish artifacts to Apache repositories (see http://www.apache.org/dev/publishing-maven-artifacts.html)
  • Have the build requirements for cpp and c_glib installed (see their README)
  • Set JIRA_USERNAME and JIRA_PASSWORD environment variables
  • Install jira Python package
  • Install en_US.UTF-8 locale (You can confirm available locales by locale -a)
  • Install Python 3 as python
  • Create your account on Bintray
  • Create dev/release/.env from dev/release/.env.example. See the comments in dev/release/.env.example how to set each variable.
  • Request INFRA to add you to https://bintray.com/apache/ members. See also: INFRA-18698 - Getting issue details... STATUS
  • Setup crossbow as described in its README
  • Have Docker and docker-compose installed

To build the source release, run the following (replace 0.1.0 with version to release):

# delete the local tag when you release RC1 or later
git tag -d apache-arrow-0.1.0

# create a release branch for the RC
git checkout -b release-0.1.0-rc0

# setup gpg agent for signing artifacts
source dev/release/setup-gpg-agent.sh

# prepare release v 0.1.0 (run tests, sign artifacts). Next version will be 0.1.1-SNAPSHOT
# on OSX use gnu-sed with homebrew: brew install gnu-sed (and export to $PATH)
dev/release/00-prepare.sh 0.1.0 0.1.1

# push the tag to the apache remote
#
# if this is RC1 or later, we need to add --force option: git push --force apache apache-arrow-0.1.0
git push apache apache-arrow-0.1.0

# checkout the tag under a new branch name and push that branch to your fork's remote
#
# to launch a crossbow build this branch _must_ exist on your remote
git checkout -b apache-arrow-0.1.0-rc0 apache-arrow-0.1.0
git push -u <your fork's remote> apache-arrow-0.1.0-rc0

# tag and stage artifacts to maven repo (repo will have to be finalized separately)
dev/release/01-perform.sh

# create the source release
#
# <rc number> starts at 0 and increments every time the release candidate is burned
# <build_number> is the same as the one from the previous step
#
# so for the first RC this would be: sh dev/release/02-source.sh 0.1.0 0
dev/release/02-source.sh 0.1.0 <rc number>

# launch crossbow build for packages and wait for that to finish.
# 
# the status can be checked using:
# python dev/tasks/crossbow.py status build-<build_number>
#
# <build_number> is output when you launch the build
# python dev/tasks/crossbow.py submit -g conda -g wheel -g linux -g nuget --arrow-version 0.1.0-rc0
python dev/tasks/crossbow.py submit -g conda -g wheel -g linux -g nuget --arrow-version 0.1.0-rc<rc number>

# download the artifacts
# 
# this will download packages to a directory called packages/
python dev/tasks/crossbow.py download-artifacts build-<build_number>

# create the binary release
#
# <rc number> starts at 0 and increments every time the release candidate is burned
# <build_number> is the same as the one from the previous step
#
# On macOS the only way I could get this to work was running "echo "UPDATESTARTUPTTY" | gpg-connect-agent" before running this comment
# otherwise I got errors referencing "ioctl" errors.
#
# so for the first RC this would be: dev/release/03-binary.sh 0.1.0 0 packages/build-<build_number>
dev/release/03-binary.sh 0.1.0 <rc number> <packages directory>

# once the vote has passed, publish the staged maven artifacts (see below)



Start the vote thread on dev@arrow.apache.org and supply instructions for verifying the integrity of the release. Approval requires a net of 3 +1 votes from PMC members. A release cannot be vetoed.

Useful commands:

To set the mvn version in the poms

mvn versions:set -DnewVersion=0.1-SNAPSHOT

Reset your workspace

git reset --hard

Setup gpg-agent

eval $(gpg-agent --daemon --allow-preset-passphrase)
gpg --use-agent -s LICENSE.txt

Delete tag locally

git tag -d apache-arrow-0.1.0

How to stage maven artifacts:

artifacts get staged during the perform phase of the scripts above.

If you need to stage the artifacts again follow the instructions bellow:

# checkout the release tag
git checkout apache-arrow-0.1.0
# setup the gpg agent for signing artifacts
source dev/release/setup-gpg-agent.sh
# build the jni bindings similarly like the 01-perform.sh does
mkdir -p cpp/java-build
pushd cpp/java-build
cmake \
  -DARROW_GANDIVA=ON \
  -DARROW_GANDIVA_JAVA=ON \
  -DARROW_JNI=ON \
  -DARROW_ORC=ON \
  -DCMAKE_BUILD_TYPE=release \
  -G Ninja \
  ..
ninja
popd
# go in the java subfolder
pushd java
# stage the artifacts using both the apache-release and arrow-jni profiles
mvn -Papache-release,arrow-jni -Darrow.cpp.build.dir=$(realpath ../cpp/java-build) deploy
popd

Test binary package upload with your own Bintray account:

# Specify BINTRAY_REPOSITORY environment variable to specify Bintray repository for test.
# For example, if you want to use https://bintray.com/kou/arrow/, you can specify BINTRAY_REPOSITORY=kou/arrow.
#
# Here is a sample command line:
# BINTRAY_REPOSITORY=kou/arrow dev/release/03-binary.sh 0.1.0 0 packages/build-<build_number>
BINTRAY_REPOSITORY=<Bintray repository for test> dev/release/03-binary.sh 0.1.0 <rc number> <packages directory>

Delete binary packages for the version:

# Here is a sample command line to delete Debian packages for 0.11.0-rc1:
# curl --verbose --fail --basic --user kou:secret --header 'Content-Type: application/json' --request DELETE https://bintray.com/api/v1/packages/apache/arrow/debian-rc/versions/0.11.0-rc1
curl --verbose --fail --basic --user <Bintray user>:<Bintray API key} --header 'Content-Type: application/json' --request DELETE https://bintray.com/api/v1/packages/apache/arrow/<Bintray package>/versions/<version>-rc<rc number>


Post-release tasks

After the release vote, we must undertake many tasks to update source artifacts, binary builds, and the Arrow website.

Be sure to go through on the following checklist:

1.  [ ] rebase master (!!don't rebase after doing a patch release!!)
2.  [ ] upload source
3.  [ ] upload binaries
4.  [ ] update website
5.  [ ] upload ruby gems
6.  [ ] upload js packages
8.  [ ] upload C# packages
9.  [ ] upload rust crates
10. [ ] update conda recipes
11. [ ] upload wheels to pypi
12. [ ] update homebrew packages
13. [ ] update maven artifacts
14. [ ] update msys2
15. [ ] update R packages
16. [ ] update docs


Rebasing the master branch on local release branch

WARNING: do not rebase master on maintenance branches containing cherry-picked / backported commits e.g. patch releases

The "local release branch" is the "release-0.1.0-rc0" branch in the above "Source release and vote" section.

The local release branch has some unpushed commits such as bumping to the next snapshot version. So you need to add these commits to the master branch. This needs force push. You can do all things by the following command line:

# Note that you must has "apache" remote that refers git@github.com:apache/arrow.git
dev/release/post-00-rebase.sh release-0.1.0-rc0

Marking the released version as "RELEASED" on JIRA

  1. Open https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/administer-versions
  2. Click "..." for the release version in "Actions" column
  3. Select "Release"
  4. Set "Release date"
  5. Click "Release" button

Starting the new version on JIRA

  1. Open https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/administer-versions
  2. Click "..." for the next version in "Actions" column
  3. Select "Edit"
  4. Set "Start date"
  5. Click "Save" button

Updating the Arrow website

The website is a Jekyll project in hosting in https://github.com/apache/apache-site repository. As part of updating the website, we must perform various subtasks.

  1. Fork the apache-site repository and clone it next to the arrow repository.
  2. Generate the release note:

    # dev/release/post-03-website 0.13.0 0.14.0
    dev/release/post-03-website <previous-version> <version>
  3. Create a pull-request and a Jira with the links the script shows at the end.


Finally, if appropriate, write a short blog post summarizing the new release highlights. Here is an example.

See site/README.md how to publish. (TODO: Should we move the document to this Wiki?)

Uploading source release artifacts to SVN

A PMC member must commit the source release artifacts to SVN:

# dev/release/post-01-upload.sh 0.0.1 0.1.0 0
dev/release/post-01-upload.sh <previous version> <version> <rc>

Uploading binary release artifacts to Bintray

A PMC member must upload the binary release artifacts to Bintray:

# dev/release/post-02-binary.sh 0.1.0 0
dev/release/post-02-binary.sh <version> <rc number>

You can test with your Bintray repository by specifying BINTRAY_ACCOUNT environment variable:

# BINTRAY_REPOSITORY=kou/arrow dev/release/post-02-binary.sh 0.1.0 0
BINTRAY_REPOSITORY=<Bintray repository> dev/release/post-02-binary.sh <version> <rc number>

Announcing release

Add relevant release data for Arrow to https://reporter.apache.org.

Write a release announcement (see example) and send to announce@apache.org and dev@arrow.apache.org. The announcement to announce@apache.org must be sent from your apache.org e-mail address to be accepted.

Updating website with new API documentation

The API documentation for C++, C Glib, Python, Java, and JavaScript can be generated via a Docker-based setup. To generate the API documentation run the following command:

bash dev/gen_apidocs.sh


This script assumes that a dist directory can be created at the same level by the current user. Please note that most of the software must be built in order to create the documentation, so this step may take some time to run, especially the first time around as the Docker container will also have to be built.

To upload the updated documentation to the website, navigate to site/asf-site and commit all changes:

pushd site/asf-site
git add .
git commit -m "Updated API documentation for version X.Y.Z"


After successfully creating the API documentation the website can be run locally to browse the API documentation from the top level
Documentation menu. To run the website issue the command:

bash dev/run_site.sh


The local URL for the website running inside the docker container will be shown as Server address: in the output of the command. To stop the server press Ctrl-C in that window.

Updating C++ and Python packages

We have been making Arrow available to C++ and Python users on the 3 major platforms (Linux, macOS, and Windows) via two package managers: pip and conda.

Updating Python Artifacts

pip Packages

pip binary packages (called "wheels") are built using the crossbow tool that we used above during the release candidate creation process and then uploaded to PyPI (PYthon Package Index) under the pyarrow package.

We use the twine tool to upload wheels to PyPI:

# upload wheels to a testing index if you have test.pypi.org account
twine upload --repository-url https://test.pypi.org/legacy/ packages/build-<build-number>/wheel-*/*.whl

# if all went well then upload to the live index
twine upload packages/build-<build-number>/wheel-*/*.whl

# go to the python directory of your arrow clone
cd python

# build source distribution
# make sure you do so from a tagged release
git checkout apache-arrow-$VERSION
python setup.py sdist

# upload the source distribution to a testing index if you have test.pypi.org account
twine upload --repository-url https://test.pypi.org/legacy/ dist/pyarrow-$VERSION.tar.gz

# if all went well then upload to the live index
twine upload dist/pyarrow-$VERSION.tar.gz


Please make sure you use twine >= 1.11.0. This supports the markdown long description in setup.py which also requires setuptools >= 38.6.0.

You must have the correct permissions on PyPI to upload wheels; ask Wes McKinney or Uwe Korn if you need help with this.

Updating conda packages

We have been building conda packages using conda-forge. The three "feedstocks" that must be updated in-order are:

  1. arrow-cpp-feedstock
  2. parquet-cpp-feedstock (it is a meta package which installs pyarrow, no need to update until parquet's version is bumped)
  3. pyarrow-feedstock
  4. r-arrow-feedstock

To update a feedstock, open a pull request updating recipe/meta.yaml as appropriate. Once you are confident that the build is good and the metadata is updated properly, merge the pull request. You must wait until the results of each of the feedstocks land in anaconda.org before moving on to the next package.

Unfortunately, you cannot open pull requests to all three repositories at the same time because they are interdependent.

Updating Homebrew packages

We have been building brew packages:

In order to update the formulas, follow the Homebrew guide:

version=1.0.0
url="https://www.apache.org/dyn/closer.lua?path=arrow/arrow-${version}/apache-arrow-${version}.tar.gz"
sha256="$(curl https://dist.apache.org/repos/dist/release/arrow/arrow-${version}/apache-arrow-${version}.tar.gz.sha256 | cut -d' ' -f1)"
GITHUB_USER="yourgithubusername"

# We need a branch with two commits: one each for apache-arrow and apache-arrow-glib
# And any edits to each formula outside of the version bump (like cmake flag changes) need to be included in each commit
# I.e. one commit per formula

# This assumes you have a fork of homebrew/homebrew-core
git remote add $GITHUB_USER git@github.com:${GITHUB_USER}/homebrew-core # Only need this the first time

: {$HOMEBREW_CORE_DIR:="/usr/local/Homebrew/Library/Taps/homebrew/homebrew-core"}
pushd $HOMEBREW_CORE_DIR
brew bump-formula-pr --strict -n -v --write apache-arrow --url="${url}" --sha256="${sha256}"
# Apply formula changes from arrow/dev/tasks/homebrew-formulae/apache-arrow.rb
# TODO: script that

brew uninstall apache-arrow
brew install apache-arrow
brew link --overwrite apache-arrow # If you've installed arrow from a local build
brew test apache-arrow
brew audit --strict apache-arrow
# If all good,
git checkout -b apache-arrow-${version}
git add .
git commit -m "apache-arrow ${version}" # The exact commit message matters

brew bump-formula-pr --strict -n -v --write apache-arrow-glib --url="${url}" --sha256="${sha256}"
brew uninstall apache-arrow
brew install apache-arrow
brew test apache-arrow-glib
brew audit --strict apache-arrow-glib
git add .
git commit -m "apache-arrow-glib ${version}"
git push -u $GITHUB_USER apache-arrow-${version}
popd

# Now go make PR
open https://github.com/${GITHUB_USER}/homebrew-core/pull/new/apache-arrow-${version}

Updating Java Maven artifacts in Maven central

How to publish the staged artifacts:

Logon to the apache repository: https://repository.apache.org/#stagingRepositories
Select the arrow staging repository you just just created: orgapachearrow-100x
Click the "close" button
Once validation has passed, click the "release" button

You must set up Maven to be able to publish to Apache's repositories. Read more at https://www.apache.org/dev/publishing-maven-artifacts.html.

Updating Ruby packages

You need an account on https://rubygems.org/ to release Ruby packages.

If you have an account on https://rubygems.org/ , you need to join owners of red-arrow gem, red-arrow-cuda gem, red-plasma gemred-gandiva gem and red-parquet gem. Existing owners can add a new account to the owners of them by the following command lines:

gem owner red-arrow -a NEW_ACCOUNT
gem owner red-arrow-cuda -a NEW_ACCOUNT
gem owner red-plasma -a NEW_ACCOUNT
gem owner red-gandiva -a NEW_ACCOUNT
gem owner red-parquet -a NEW_ACCOUNT

You can update Ruby packages when you join owners of them:

# dev/release/post-04-ruby.sh 0.13.0
dev/release/post-04-ruby.sh <version>

Updating JavaScript packages

In order to publish the binary build to npm you will need to get access to the project by asking one of the current collaborators listed at https://www.npmjs.com/package/apache-arrow

When you have access you can publish releases to npm by running the npm-release.sh script inside the JS source release:

# Login to npmjs.com (You need to do this only for the first time)
npm login

# dev/release/post-05-js.sh 0.13.0
dev/release/post-05-js.sh <version>

Updating .NET NuGet packages

You need an account on https://www.nuget.org/. You need to join owners of Apache.Arrow package. Existing owners can invite you to the owners at https://www.nuget.org/packages/Apache.Arrow/Manage .

You need to create an API key at https://www.nuget.org/account/apikeys to upload from command line.

Install the latest .NET Core SDK from https://dotnet.microsoft.com/download.

# NUGET_API_KEY=YOUR_NUGET_API_KEY dev/release/post-06-csharp.sh 0.13.0
NUGET_API_KEY=<your NuGet API key> dev/release/post-06-csharp.sh <version>

Updating Rust packages

You need permissions to the arrow, parquet and datafusion crates on crates.io to publish the Rust artifacts.

# Login to crates.io (You need to do this only for the first time)
cargo login

# If you don't have the latest Rust, you can install it automatically by INSTALL_RUST=yes:
#  INSTALL_RUST=yes dev/release/post-07-rust.sh 0.13.0
# If you have the latest Rust, you just run the following:
#  dev/release/post-07-rust.sh 0.13.0
dev/release/post-07-rust.sh <version>

Updating R packages

To publish the R package on CRAN, there are a few steps we need to do first in order to ensure that binaries for Windows and macOS are available to CRAN. 

All of these repositories are maintained by Jeroen Ooms <jeroenooms@gmail.com>. He may notice that the Apache Arrow release has been approved and make these patches on his own; otherwise, he's the one who will need to accept the pull requests.

  • Update the "autobrew" formula, a fork of Homebrew that is used in R package configure scripts to pull system dependencies on CRAN, where Homebrew is not available. Make a pull request to modify https://github.com/autobrew/homebrew-core/blob/master/Formula/apache-arrow.rb to update the version, SHA, and any changes to dependencies and build steps. Those dependency/build updates will have been already recorded in the copy we have of that formula in dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb. This formula could be the same as the official Homebrew formula, but historically it has been a slimmer build of the C++ library (no Gandiva, for example). Note that making this pull request is not sufficient. The "bottles" will need to be built as well, and this is currently done on a bespoke VM that Jeroen has. But, making this pull request will give Jeroen the information he needs to do that part. If there are new dependencies, you'll also need to update the script at https://github.com/jeroen/autobrew/blob/gh-pages/apache-arrow to include any additions made to r/tools/autobrew. (These local copies of the autobrew scripts are tested in the nightly R package builds.)
  • Update the analogous build scripts for Windows, found in two locations: (1) rtools-packages, https://github.com/r-windows/rtools-packages/blob/master/mingw-w64-arrow/PKGBUILD; and (2) rtools-backports, https://github.com/r-windows/rtools-backports/blob/master/mingw-w64-arrow/PKGBUILD. (These use the current and future RTools toolchains for compiling libraries on Windows for use in R. See the rtools-backports readme and things it links to for some context.) Update those PKGBUILD scripts with the latest version and SHA, and adapt any build step changes in the new release from the ci/PKGBUILD file in apache/arrow, from which these scripts are based. 
  • Update "rwinlib", the repository where binary packages compiled especially for R on Windows are customarily hosted, after those two rtools PRs are merged: https://github.com/rwinlib/arrow. (When the R package is installed on Windows, the rwinlib library gets downloaded in this R script, which is called in configure.win). The rtools-backports repository generates build artifacts that go into the rwinlib package. Our best understanding of how that happens is captured in ci/windows-pkg-arrow-for-r.sh, which is called in the apache/arrow Appveyor job. You may be able to download the build\arrow-x.y.z.zip artifact (see here for an example of the Appveyor build artifacts), which that script creates, from our R Appveyor job from the release tag and use that for this submission. Most likely, Jeroen will handle this himself after the rtools-packages/backports PRs are merged.

Once these binary prerequisites have been satisfied, we can submit to CRAN. Given the vagaries of the process, it is best if the R developers on the project verify the CRAN-worthiness of the package before submitting. Our CI systems give us some coverage for the things that CRAN checks, but there are a couple of final tests we should do to confirm that the release binaries will work and that everything runs on the same infrastructure that CRAN has, which is difficult/impossible to emulate fully on Travis or with Docker.

Build the R package locally (R CMD build . from within the r/ directory) and do these checks:

  • Use R-hub for the "usual" CRAN incoming checks. R-hub works hard to match the build environments that CRAN uses and provides a services for checking packages. From within an R session, rhub::check_for_cran() will trigger several builds, or you can specify them individually in the web app.
  • It's also good to do a macOS check on R-hub (not part of the check_for_cran() suite) to verify the "autobrew" formula. R-hub has a good approximation of the dated macOS setup that CRAN uses, which is seemingly not possible to reproduce on Travis.
  • For Windows, submit the built package to the Winbuilder service and check the package for R-devel, the development version of R. R-hub has Windows infrastructure but winbuilder is the exact setup that CRAN uses and catches things that R-hub doesn't.  

If those are clean, let's submit. CRAN has a web form for uploading packages. The release process requires email confirmation from the R package maintainer, currently Neal Richardson.

Updating MSYS2 package

You need to send a pull request to https://github.com/msys2/MINGW-packages to upgrade mingw-w64-arrow package. At least you need to update pkgver and sha256sums . If there are changes for build option, you may need to update CMake options and Meson options.

See https://github.com/msys2/MINGW-packages/pull/6175 for example.

Removing source artifacts for RC

Source artifacts for RC are needless when all release tasks for the version are finished. We can remove source artifacts for RC by the following command line:

# dev/release/post-08-remove-rc.sh 0.13.0
dev/release/post-08-remove-rc.sh <version>
  • No labels