The previous Release management guide has been moved to the developers guide on the apache arrow repository
https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst

Can be viewed online here:
https://arrow.apache.org/docs/dev/developers/release.html


Updating C++ and Python packages

We have been making Arrow available to C++ and Python users on the 3 major platforms (Linux, macOS, and Windows) via two package managers: pip and conda.

Updating Python Artifacts

pip Packages

pip binary packages (called "wheels") and source package (called "sdist") are built using the crossbow tool that we used above during the release candidate creation process and then uploaded to PyPI (PYthon Package Index) under the pyarrow package.

We use the twine tool to upload wheels to PyPI:

# dev/release/post-09-python.sh 3.0.0 0

# first try to upload to https://test.pypi.org/legacy/
TEST_PYPI=1 dev/release/post-09-python.sh <version> <rc number>

# if everything worked properly upload to the main pypi repository
TEST_PYPI=0 dev/release/post-09-python.sh <version> <rc number>


Please make sure you use twine >= 1.11.0. This supports the markdown long description in setup.py which also requires setuptools >= 38.6.0.

You must have the correct permissions on PyPI to upload wheels; ask Wes McKinney or Uwe Korn if you need help with this.

Updating conda packages

We have been building conda packages using conda-forge. The three "feedstocks" that must be updated in-order are:

  1. arrow-cpp-feedstock
  2. parquet-cpp-feedstock (it is a meta package which installs pyarrow, no need to update until parquet's version is bumped)
  3. pyarrow-feedstock
  4. r-arrow-feedstock

To update a feedstock, open a pull request updating recipe/meta.yaml as appropriate. Once you are confident that the build is good and the metadata is updated properly, merge the pull request. You must wait until the results of each of the feedstocks land in anaconda.org before moving on to the next package.

Unfortunately, you cannot open pull requests to all three repositories at the same time because they are interdependent.

Updating Homebrew packages

We have been building brew packages:

In order to update the formulas, follow the Homebrew guide. We have a script for this task.

You need to satisfy followings before running the post-12-msys2.sh script.

  1. Install Homebrew on your machine: https://docs.brew.sh/Installation
    1. brewlinux also works
  2. Fork https://github.com/Homebrew/homebrew-core

Running the post-13-homebrew.sh  script will prepare a branch to update the apache-arrow and apache-arrow-glib formulae. You can create a pull request for updating these fomulae from the prepared branch.

# dev/release/post-13-homebrew.sh 7.0.0 kou
dev/release/post-13-homebrew.sh <version> <your-github-account>

The script pushes apache-arrow-<version>  branch to your fork. You can create a pull request at https://github.com/<your-github-account>/Homebrew-core/pull/new/apache-arrow-<version> . See the message from this script for details.

Troubleshoot

If you face a too many open files error:

Error: Too many open files @ rb_sysopen - /home/linuxbrew/.linuxbrew/Cellar/aws-sdk-cpp/1.9.310/lib/libaws-cpp-sdk-rds-data.so

You might want to increase the ulimit:

# See number of limit open files
ulimit -n
# Update the number of open files to 8192
ulimit -n 8192


Updating Java Maven artifacts in Maven central

How to publish the staged artifacts:

Logon to the apache repository: https://repository.apache.org/#stagingRepositories
Select the arrow staging repository you created for RC: orgapachearrow-100x
Click the "release" button

Updating Ruby packages

NOTE: This must be done after uploading binary release artifacts, updating Homebrew packages and updating MSYS2 packages. If we update Ruby packages before them, gem install  is failed because required system Apache Arrow C++/GLib packages aren't found.

You need an account on https://rubygems.org/ to release Ruby packages.

If you have an account on https://rubygems.org/ , you need to join owners of red-arrow gem, red-arrow-cuda gem, red-arrow-dataset gem, red-arrow-flight gem, red-gandiva gem, red-parquet gem and red-plasma gem. Existing owners can add a new account to the owners of them by the following command lines:

gem owner red-arrow -a NEW_ACCOUNT
gem owner red-arrow-cuda -a NEW_ACCOUNT
gem owner red-arrow-dataset -a NEW_ACCOUNT
gem owner red-arrow-flight -a NEW_ACCOUNT
gem owner red-gandiva -a NEW_ACCOUNT
gem owner red-parquet -a NEW_ACCOUNT
gem owner red-plasma -a NEW_ACCOUNT

You can update Ruby packages when you join owners of them:

# dev/release/post-04-ruby.sh 0.13.0
dev/release/post-04-ruby.sh <version>

Updating JavaScript packages

In order to publish the binary build to npm you will need to get access to the project by asking one of the current collaborators listed at https://www.npmjs.com/package/apache-arrow

When you have access you can publish releases to npm by running the npm-release.sh script inside the JS source release:

# Login to npmjs.com (You need to do this only for the first time)
npm login --registry=https://registry.yarnpkg.com/

# dev/release/post-05-js.sh 0.13.0
dev/release/post-05-js.sh <version>

Updating .NET NuGet packages

You need an account on https://www.nuget.org/. You need to join owners of Apache.Arrow package. Existing owners can invite you to the owners at https://www.nuget.org/packages/Apache.Arrow/Manage .

You need to create an API key at https://www.nuget.org/account/apikeys to upload from command line.

Install the latest .NET Core SDK from https://dotnet.microsoft.com/download.

# NUGET_API_KEY=YOUR_NUGET_API_KEY dev/release/post-05-csharp.sh 0.13.0
NUGET_API_KEY=<your NuGet API key> dev/release/post-05-csharp.sh <version>

Updating R packages

To publish the R package on CRAN, there are a few steps we need to do first in order to ensure that binaries for Windows and macOS are available to CRAN. Jeroen Ooms <jeroenooms@gmail.com> maintains several projects that build C++ dependencies for R packages for macOS and Windows. We test copies of these same build scripts in our CI, and at release time, we need to send any changes we have and update the versions/hashes upstream.

When the release candidate is made, make draft pull requests to each repository using the rc, updating the version and SHA, as well as any cmake build changes from the corresponding files in apache/arrow. Jeroen may merge these PRs before the release vote passes, build the binary artifacts, and publish them in the right places so that we can do pre-submission checks (see below). After the release candidate vote passes, update these PRs to point to the official (non-rc) URL and mark them as ready for review. Jeroen will merge, build the binary artifacts, and publish them in the right places. 

The files/repos to update:

Once these binary prerequisites have been satisfied, we can submit to CRAN. Given the vagaries of the process, it is best if the R developers on the project verify the CRAN-worthiness of the package before submitting. Our CI systems give us some coverage for the things that CRAN checks, but there are a couple of final tests we should do to confirm that the release binaries will work and that everything runs on the same infrastructure that CRAN has, which is difficult/impossible to emulate fully on Travis or with Docker.

Build and check the R package locally (make release from within the r/ directory) and do these extra checks with the tarball that creates:

  • Use R-hub for some CRAN-like incoming checks, as well as a couple we can't currently trigger in our CI. R-hub works hard to match the build environments that CRAN uses and provides a services for checking packages. They have a good approximation of the dated macOS setup that CRAN uses, which is seemingly not possible to reproduce on Travis or GHA. To trigger these in one line from R, rhub::check("arrow_x.y.z.tar.gz", platform=c("debian-gcc-patched", "fedora-clang-devel", "macos-highsierra-release-cran")) 
  • For Windows, submit the built package to the Winbuilder service and check the package for R-devel, the development version of R. R-hub has Windows infrastructure but winbuilder is the exact setup that CRAN uses and catches things that R-hub doesn't.
  • There is also an M1 Mac builder, like win-builder: https://mac.r-project.org/macbuilder/submit.html

If those are clean, let's submit. CRAN has a web form for uploading packages. The release process requires email confirmation from the R package maintainer, currently Neal Richardson.

CI update after major release

We have a CI job that will write parquet and feather files with the development branch and then use previous release of arrow to read them in, testing that we maintain backwards compatibility with these formats (for the features that are tested). When there is a release, add the just-released version of arrow to the CI job (called test-r-version-compatibility) The template for this job is https://github.com/apache/arrow/blob/master/dev/tasks/r/github.linux.arrow.version.back.compat.yml and you will need to add a new line to the matrix under the read-files job with the version just released as the "old_arrow_version" along with the current release R version number. We use the release R version to test this backwards compatibility in order to take advantage of binaries built and hosted by RStudio Package Manager which makes installing the release versions of old arrow releases very easy and we don't have to re-create build environments to match them. 

Update versions after patch release

When a major release happens, we add a commit to the master branch that bumps the dev versions of all of the libraries, but we don't do this after a patch release. For all other languages, the version string is already set to NEXT.0.0-SNAPSHOT or similar, so there's nothing to change. But the convention in R is to do x.y.z.9000, so if we don't increment x.y.z to match what was released, we'll get check failures because our dev version is too low. So after a patch release, you'll need to add a commit to master bumping all of the versions and adding an entry to NEWS.md. Here's an example

Updating the vcpkg port

You need to submit a pull request to http://github.com/microsoft/vcpkg to update the arrow port which distributes the Arrow C++ library. This port consists of a JSON manifest file with metadata and dependency information, a vcpkg-flavored CMake script, and patch file(s) to apply some necessary fixes. There are no binary assets; the CMake script downloads the source release. Note that the vcpkg tool itself and the vcpkg port recipes are all stored in the same GitHub repository.

See https://github.com/microsoft/vcpkg/pull/19229 for an example.

Detailed instructions

  • First check that someone else has not already opened a PR to update the port recipe. If no, then proceed. If yes, then contribute to that PR instead of opening a new one, or if there is something badly wrong with that PR, then proceed but when you open the new PR, add a comment that references the other PR and clearly explains why the new PR should supersede it.

  • Fork the repository at https://github.com/microsoft/vcpkg

  • Clone the fork to your computer

  • Create a new branch for this release

  • Run ./vcpkg/bootstrap-vcpkg.sh to install vcpkg on your computer

  • Check to see if any of the patch files are obviously no longer needed. For example, a patch might have been used to backport changes that were made in the apache/arrow  repository after the previous release. If any patches are obviously no longer needed, remove the patch files and remove the lines in portfile.cmake  that refer to them.
  • Update REF and SHA512 in the vcpkg_from_github call in ports/arrow/portfile.cmake

  • Update version in ports/arrow/vcpkg.json and reset port-version (if present) to 0

  • Run ./vcpkg format-manifest ports/arrow/vcpkg.json to format the manifest

  • Commit (but do not yet push) the above changes

  • Run ./vcpkg x-add-version arrow to update version files

  • Commit the version files changes

  • Push the commits to your fork

  • Open a draft PR with the title [arrow] Update to X.Y.Z and with a brief comment like Updates the arrow port to version X.Y.Z

  • Check if there are any open GitHub issues requesting that the arrow port be updated. If so, reference it/them in the PR comment like Closes #12345

  • Go through Microsoft's CLA process if you have not already

  • Wait for the CI to run.

  • If any of the CI checks fail, fix the problems.

    • The most likely problem will be that the patches need to be updated. This can be difficult and time-consuming. You can try checking out the previous release tag, applying the old patches with git apply old.patch, creating a commit with the changes from the old patches, checking out the new release tag, cherry-picking the commit you created, then creating new patches with git diff --ignore-submodules=all > new.patch, but this might result in merge conflicts. If all else fails, you can go through the diff hunks and hunk headers line by line manually updating them.

    • Another common problem is failures in the CI checks for other vcpkg ports that depend directly or indirectly on arrow. When you open a PR to update a vcpkg port, the CI checks test all the other vcpkg ports that depend on it. This sometimes causes the CI checks to fail for reasons that have nothing to do with arrow. If these problems happen, check if there are open issues for them, open new issues if there are not, and add a comment in the PR explaining the failure. Add an entry to scripts\ci.baseline.txt to indicate that the failure of the other vcpkg port is expected; this will suppress the CI failures.

    • When finished fixing the problems, commit your fixes.

    • Before pushing  fix commits, run ./vcpkg x-add-version arrow --overwrite-version to update the hash in the version file and commit this change. If you don't do this, the GitHub Actions bot will comment in the PR reminding you to.

    • Push the commits to your fork

  • If necessary, debug problems by running ./vcpkg install arrow locally. You might need to run this on a specific architecture or specify a triplet to reproduce a failure. For example, on a Windows computer, run ./vcpkg install arrow:x64-windows-static to install generate static x64 libraries. Building locally gives you access to log files that it is not possible to see from the CI checks.

  • Run tests locally to check for problems that the vcpkg CI would not catch. For example, the CI checks do not test the non-default features of the arrow port. At the time of this writing, the non-default features are flight, dataset, jemalloc, mimalloc, orc, and s3. Try to test these features locally, to the best of your ability. For example, on a Windows computer, run ./vcpkg install arrow[flight,dataset,mimalloc,orc,s3]:x64-windows --recurse to install Arrow as a dynamic x64 library with all the non-default features enabled except for jemalloc which Arrow cannot use on Windows.
  • When the CI is all green and your local tests are all passing, mark the PR as ready to review. The vcpkg maintainers will review it, ask questions, and approve and merge it if there are no unresolved problems.

  • Write a comment in the PR tagging members of the Arrow developer community who use the vcpkg arrow port. Currently these include:

If you intend to make any changes to the arrow port recipe beyond a simple version upgrade, review the vcpkg maintainer guide at https://github.com/microsoft/vcpkg/blob/master/docs/maintainers/maintainer-guide.md.

Updating MSYS2 package

You need to fork https://github.com/msys2/MINGW-packages and clone your fork on your machine before running the post-12-msys2.sh script.

Running the post-12-msys2.sh  script will prepare a branch to update the arrow MINGW package. You can create a pull request for updating the arrow  MINGW package from the prepared branch.

# dev/release/post-12-msys2.sh 7.0.0 ~/MINGW-packages
dev/release/post-12-msys2.sh <version> <working-copy-of-your-fork-of-msys2/MINGW-packages>

The script pushes arrow-<version>  branch to your fork. You can create a pull request at https://github.com/<your-github-account>/MINGW-packages/pull/new/arrow-<version> . See the message from this script for details.

Bumping versions

This task is only for major releases.

Running the post-11-bump-versions.sh script will bump versions on master to <next-version>-SNAPSHOT and add  apache-arrow-<next-version>.dev tag.

# dev/release/post-11-bump-version.sh 6.0.0 7.0.0
dev/release/post-11-bump-version.sh <version> <next-version>

The script assumes the remote name is apache.

Updating tags for Go modules

Running the post-10-go.sh script will add the needed tags for releasing the Go modules. Since the Go modules are not at the root, there needs to be tags of the format go/arrow/v<version> and go/parquet/v<version>

# dev/release/post-10-go.sh 6.0.0
dev/release/post-10-go.sh <version>

This will add the needed tags, pointing them at the apache-arrow-<version> tag. The script assumes the remote name is apache.

Removing old artifacts

Source artifacts for RC are needless when all release tasks for the version are finished. And old releases should be archived. We can remove source artifacts for RC and archive old releases by the following command line:

dev/release/post-07-remove-old-artifacts.sh
  • No labels