The previous Release management guide has been moved to the developers guide on the apache arrow repository
https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst
Can be viewed online here:
https://arrow.apache.org/docs/dev/developers/release.html
Updating C++ and Python packages
We have been making Arrow available to C++ and Python users on the 3 major platforms (Linux, macOS, and Windows) via two package managers: pip and conda.
Updating Python Artifacts
pip
Packages
pip
binary packages (called "wheels") and source package (called "sdist") are built using the crossbow
tool that we used above during the release candidate creation process and then uploaded to PyPI (PYthon Package Index) under the pyarrow
package.
We use the twine
tool to upload wheels to PyPI:
# dev/release/post-09-python.sh 3.0.0 0 # first try to upload to https://test.pypi.org/legacy/ TEST_PYPI=1 dev/release/post-09-python.sh <version> <rc number> # if everything worked properly upload to the main pypi repository TEST_PYPI=0 dev/release/post-09-python.sh <version> <rc number>
Please make sure you use twine >= 1.11.0
. This supports the markdown long description in setup.py which also requires setuptools >= 38.6.0
.
You must have the correct permissions on PyPI to upload wheels; ask Wes McKinney or Uwe Korn if you need help with this.
Updating conda
packages
We have been building conda packages using conda-forge. The three "feedstocks" that must be updated in-order are:
arrow-cpp-feedstock
parquet-cpp-feedstock
(it is a meta package which installs pyarrow, no need to update until parquet's version is bumped)pyarrow-feedstock
r-arrow-feedstock
To update a feedstock, open a pull request updating recipe/meta.yaml
as appropriate. Once you are confident that the build is good and the metadata is updated properly, merge the pull request. You must wait until the results of each of the feedstocks land in anaconda.org before moving on to the next package.
Unfortunately, you cannot open pull requests to all three repositories at the same time because they are interdependent.
Updating Homebrew packages
We have been building brew packages:
In order to update the formulas, follow the Homebrew guide. We have a script for this task.
You need to satisfy followings before running the post-12-msys2.sh
script.
- Install Homebrew on your machine: https://docs.brew.sh/Installation
- brewlinux also works
- Fork https://github.com/Homebrew/homebrew-core
Running the post-13-homebrew.sh
script will prepare a branch to update the apache-arrow
and apache-arrow-glib
formulae. You can create a pull request for updating these fomulae from the prepared branch.
# dev/release/post-13-homebrew.sh 7.0.0 kou dev/release/post-13-homebrew.sh <version> <your-github-account>
The script pushes apache-arrow-<version>
branch to your fork. You can create a pull request at https://github.com/<your-github-account>/Homebrew-core/pull/new/apache-arrow-<version>
. See the message from this script for details.
Troubleshoot
If you face a too many open files error:
Error: Too many open files @ rb_sysopen - /home/linuxbrew/.linuxbrew/Cellar/aws-sdk-cpp/1.9.310/lib/libaws-cpp-sdk-rds-data.so
You might want to increase the ulimit:
# See number of limit open files ulimit -n # Update the number of open files to 8192 ulimit -n 8192
Updating Java Maven artifacts in Maven central
How to publish the staged artifacts:
Logon to the apache repository: https://repository.apache.org/#stagingRepositories
Select the arrow staging repository you created for RC: orgapachearrow-100x
Click the "release" button
Updating Ruby packages
NOTE: This must be done after uploading binary release artifacts, updating Homebrew packages and updating MSYS2 packages. If we update Ruby packages before them, gem install
is failed because required system Apache Arrow C++/GLib packages aren't found.
You need an account on https://rubygems.org/ to release Ruby packages.
If you have an account on https://rubygems.org/ , you need to join owners of red-arrow gem, red-arrow-cuda gem, red-arrow-dataset gem, red-arrow-flight gem, red-gandiva gem, red-parquet gem and red-plasma gem. Existing owners can add a new account to the owners of them by the following command lines:
gem owner red-arrow -a NEW_ACCOUNT gem owner red-arrow-cuda -a NEW_ACCOUNT gem owner red-arrow-dataset -a NEW_ACCOUNT gem owner red-arrow-flight -a NEW_ACCOUNT gem owner red-gandiva -a NEW_ACCOUNT gem owner red-parquet -a NEW_ACCOUNT gem owner red-plasma -a NEW_ACCOUNT
You can update Ruby packages when you join owners of them:
# dev/release/post-04-ruby.sh 0.13.0 dev/release/post-04-ruby.sh <version>
Updating JavaScript packages
In order to publish the binary build to npm you will need to get access to the project by asking one of the current collaborators listed at https://www.npmjs.com/package/apache-arrow
When you have access you can publish releases to npm by running the npm-release.sh script inside the JS source release:
# Login to npmjs.com (You need to do this only for the first time) npm login --registry=https://registry.yarnpkg.com/ # dev/release/post-05-js.sh 0.13.0 dev/release/post-05-js.sh <version>
Updating .NET NuGet packages
You need an account on https://www.nuget.org/. You need to join owners of Apache.Arrow package. Existing owners can invite you to the owners at https://www.nuget.org/packages/Apache.Arrow/Manage .
You need to create an API key at https://www.nuget.org/account/apikeys to upload from command line.
Install the latest .NET Core SDK from https://dotnet.microsoft.com/download.
# NUGET_API_KEY=YOUR_NUGET_API_KEY dev/release/post-05-csharp.sh 0.13.0 NUGET_API_KEY=<your NuGet API key> dev/release/post-05-csharp.sh <version>
Updating R packages
To publish the R package on CRAN, there are a few steps we need to do first in order to ensure that binaries for Windows and macOS are available to CRAN. Jeroen Ooms <jeroenooms@gmail.com> maintains several projects that build C++ dependencies for R packages for macOS and Windows. We test copies of these same build scripts in our CI, and at release time, we need to send any changes we have and update the versions/hashes upstream.
When the release candidate is made, make draft pull requests to each repository using the rc, updating the version and SHA, as well as any cmake build changes from the corresponding files in apache/arrow. Jeroen may merge these PRs before the release vote passes, build the binary artifacts, and publish them in the right places so that we can do pre-submission checks (see below). After the release candidate vote passes, update these PRs to point to the official (non-rc) URL and mark them as ready for review. Jeroen will merge, build the binary artifacts, and publish them in the right places.
The files/repos to update:
- The "autobrew" formula, a fork of Homebrew that is used in R package configure scripts to pull system dependencies on CRAN, where Homebrew is not available.
- Make a pull request to modify https://github.com/autobrew/homebrew-core/blob/master/Formula/apache-arrow.rb to update the version, SHA, and any changes to dependencies and build steps. Those dependency/build updates will have been already recorded in the copy we have of that formula in dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb. This formula could be the same as the official Homebrew formula, but historically it has been a slimmer build of the C++ library (no Gandiva, for example). There are also some subtle differences because we need a static build.
- https://github.com/autobrew/homebrew-cran/blob/master/Formula/apache-arrow-static.rb, which is the same general deal–a Homebrew formula. autobrew/homebrew-core (the first one) is what is used to build the bottle that get downloaded on CRAN because they still use a macOS version that Homebrew no longer supports. This version builds the bottles for newer versions of macOS, which I think you'd only encounter if you did
install.packages("arrow", type = "source"
) on a (newer) mac. Our copy of this formula is at dev/tasks/homebrew-formulae/autobrew/apache-arrow-static.rb. - If there are new dependencies, you'll also need to update the script at https://github.com/autobrew/scripts/blob/master/apache-arrow to include any additions made to r/tools/autobrew. (These local copies of the autobrew scripts are tested in the nightly R package builds.)
- If you need to rebuild the bottles without changing the Arrow version (for example, you forgot to change a cmake flag and only realize it after Jeroen merges), be sure to add/bump the
revision
in the formula (like this) in order to get the bottles to be rebuilt.
- Update the analogous build script for Windows. Adapt any build step changes in the new release from the ci/PKGBUILD file in apache/arrow.
- rtools-packages, https://github.com/r-windows/rtools-packages/blob/master/mingw-w64-arrow/PKGBUILD. This is for R >= 4.0 only. As of 10.0.0, we use C++17, which is not supported on the rtools35 toolchain.
- PKGBUILD has a line that says "uncomment this to test with rc". For the draft PR, use the RC.
- After the release vote passes, either edit your PR (if it is still open) or open a new PR to switch back to the official release URL.
Once these binary prerequisites have been satisfied, we can submit to CRAN. Given the vagaries of the process, it is best if the R developers on the project verify the CRAN-worthiness of the package before submitting. Our CI systems give us some coverage for the things that CRAN checks, but there are a couple of final tests we should do to confirm that the release binaries will work and that everything runs on the same infrastructure that CRAN has, which is difficult/impossible to emulate fully on Travis or with Docker.
Build and check the R package locally (make release
from within the r/
directory) and do these extra checks with the tarball that creates:
- Use R-hub for some CRAN-like incoming checks, as well as a couple we can't currently trigger in our CI. R-hub works hard to match the build environments that CRAN uses and provides a services for checking packages. They have a good approximation of the dated macOS setup that CRAN uses, which is seemingly not possible to reproduce on Travis or GHA. To trigger these in one line from R,
rhub::check("arrow_x.y.z.tar.gz", platform=c("debian-gcc-patched", "fedora-clang-devel", "macos-highsierra-release-cran"))
- For Windows, submit the built package to the Winbuilder service and check the package for R-devel, the development version of R. R-hub has Windows infrastructure but winbuilder is the exact setup that CRAN uses and catches things that R-hub doesn't.
- There is also an M1 Mac builder, like win-builder: https://mac.r-project.org/macbuilder/submit.html
If those are clean, let's submit. CRAN has a web form for uploading packages. The release process requires email confirmation from the R package maintainer, currently Neal Richardson.
CI update after major release
We have a CI job that will write parquet and feather files with the development branch and then use previous release of arrow to read them in, testing that we maintain backwards compatibility with these formats (for the features that are tested). When there is a release, add the just-released version of arrow to the CI job (called test-r-version-compatibility
) The template for this job is https://github.com/apache/arrow/blob/master/dev/tasks/r/github.linux.arrow.version.back.compat.yml and you will need to add a new line to the matrix under the read-files job with the version just released as the "old_arrow_version" along with the current release R version number. We use the release R version to test this backwards compatibility in order to take advantage of binaries built and hosted by RStudio Package Manager which makes installing the release versions of old arrow releases very easy and we don't have to re-create build environments to match them.
Update versions after patch release
When a major release happens, we add a commit to the master branch that bumps the dev versions of all of the libraries, but we don't do this after a patch release. For all other languages, the version string is already set to NEXT.0.0-SNAPSHOT or similar, so there's nothing to change. But the convention in R is to do x.y.z.9000, so if we don't increment x.y.z to match what was released, we'll get check failures because our dev version is too low. So after a patch release, you'll need to add a commit to master bumping all of the versions and adding an entry to NEWS.md. Here's an example.
Updating the vcpkg port
You need to submit a pull request to http://github.com/microsoft/vcpkg to update the arrow port which distributes the Arrow C++ library. This port consists of a JSON manifest file with metadata and dependency information, a vcpkg-flavored CMake script, and patch file(s) to apply some necessary fixes. There are no binary assets; the CMake script downloads the source release. Note that the vcpkg tool itself and the vcpkg port recipes are all stored in the same GitHub repository.
See https://github.com/microsoft/vcpkg/pull/19229 for an example.
Detailed instructions
First check that someone else has not already opened a PR to update the port recipe. If no, then proceed. If yes, then contribute to that PR instead of opening a new one, or if there is something badly wrong with that PR, then proceed but when you open the new PR, add a comment that references the other PR and clearly explains why the new PR should supersede it.
Fork the repository at https://github.com/microsoft/vcpkg
Clone the fork to your computer
Create a new branch for this release
Run
./vcpkg/bootstrap-vcpkg.sh
to install vcpkg on your computer- Check to see if any of the patch files are obviously no longer needed. For example, a patch might have been used to backport changes that were made in the
apache/arrow
repository after the previous release. If any patches are obviously no longer needed, remove the patch files and remove the lines inportfile.cmake
that refer to them. Update
REF
andSHA512
in thevcpkg_from_github
call inports/arrow/portfile.cmake
Update
version
inports/arrow/vcpkg.json
and resetport-version
(if present) to0
Run
./vcpkg format-manifest ports/arrow/vcpkg.json
to format the manifestCommit (but do not yet push) the above changes
Run
./vcpkg x-add-version arrow
to update version filesCommit the version files changes
Push the commits to your fork
Open a draft PR with the title [arrow] Update to X.Y.Z and with a brief comment like Updates the arrow port to version X.Y.Z
Check if there are any open GitHub issues requesting that the arrow port be updated. If so, reference it/them in the PR comment like Closes #12345
Go through Microsoft's CLA process if you have not already
Wait for the CI to run.
If any of the CI checks fail, fix the problems.
The most likely problem will be that the patches need to be updated. This can be difficult and time-consuming. You can try checking out the previous release tag, applying the old patches with
git apply old.patch
, creating a commit with the changes from the old patches, checking out the new release tag, cherry-picking the commit you created, then creating new patches withgit diff --ignore-submodules=all > new.patch
, but this might result in merge conflicts. If all else fails, you can go through the diff hunks and hunk headers line by line manually updating them.Another common problem is failures in the CI checks for other vcpkg ports that depend directly or indirectly on arrow. When you open a PR to update a vcpkg port, the CI checks test all the other vcpkg ports that depend on it. This sometimes causes the CI checks to fail for reasons that have nothing to do with arrow. If these problems happen, check if there are open issues for them, open new issues if there are not, and add a comment in the PR explaining the failure. Add an entry to
scripts\ci.baseline.txt
to indicate that the failure of the other vcpkg port is expected; this will suppress the CI failures.When finished fixing the problems, commit your fixes.
Before pushing fix commits, run
./vcpkg x-add-version arrow --overwrite-version
to update the hash in the version file and commit this change. If you don't do this, the GitHub Actions bot will comment in the PR reminding you to.Push the commits to your fork
If necessary, debug problems by running
./vcpkg install arrow
locally. You might need to run this on a specific architecture or specify a triplet to reproduce a failure. For example, on a Windows computer, run./vcpkg install arrow:x64-windows-static
to install generate static x64 libraries. Building locally gives you access to log files that it is not possible to see from the CI checks.- Run tests locally to check for problems that the vcpkg CI would not catch. For example, the CI checks do not test the non-default features of the arrow port. At the time of this writing, the non-default features are
flight
,dataset
,jemalloc
,mimalloc
,orc
, ands3
. Try to test these features locally, to the best of your ability. For example, on a Windows computer, run./vcpkg install arrow[flight,dataset,mimalloc,orc,s3]:x64-windows --recurse
to install Arrow as a dynamic x64 library with all the non-default features enabled except forjemalloc
which Arrow cannot use on Windows. When the CI is all green and your local tests are all passing, mark the PR as ready to review. The vcpkg maintainers will review it, ask questions, and approve and merge it if there are no unresolved problems.
- Write a comment in the PR tagging members of the Arrow developer community who use the vcpkg arrow port. Currently these include:
- Tanguy Fautré (GPSnoopy) who helps to maintain ParquetSharp
- Jonathan Giannuzzi (jgiannuzzi) who helps to maintain ParquetSharp
- If you would like to be added to this list of people who are tagged in arrow vcpkg port update PRs, please email dev@arrow.apache.org to request it. Provide your name and your GitHub handle.
If you intend to make any changes to the arrow port recipe beyond a simple version upgrade, review the vcpkg maintainer guide at https://github.com/microsoft/vcpkg/blob/master/docs/maintainers/maintainer-guide.md.
Updating MSYS2 package
You need to fork https://github.com/msys2/MINGW-packages and clone
your fork on your machine before running the post-12-msys2.sh
script.
Running the post-12-msys2.sh
script will prepare a branch to update the arrow
MINGW package. You can create a pull request for updating the arrow
MINGW package from the prepared branch.
# dev/release/post-12-msys2.sh 7.0.0 ~/MINGW-packages dev/release/post-12-msys2.sh <version> <working-copy-of-your-fork-of-msys2/MINGW-packages>
The script pushes arrow-<version>
branch to your fork. You can create a pull request at https://github.com/<your-github-account>/MINGW-packages/pull/new/arrow-<version>
. See the message from this script for details.
Bumping versions
This task is only for major releases.
Running the post-11-bump-versions.sh script will bump versions on master to <next-version>-SNAPSHOT
and add apache-arrow-<next-version>.dev
tag.
# dev/release/post-11-bump-version.sh 6.0.0 7.0.0 dev/release/post-11-bump-version.sh <version> <next-version>
The script assumes the remote name is apache.
Updating tags for Go modules
Running the post-10-go.sh script will add the needed tags for releasing the Go modules. Since the Go modules are not at the root, there needs to be tags of the format go/arrow/v<version>
and go/parquet/v<version>
# dev/release/post-10-go.sh 6.0.0 dev/release/post-10-go.sh <version>
This will add the needed tags, pointing them at the apache-arrow-<version> tag. The script assumes the remote name is apache.
Removing old artifacts
Source artifacts for RC are needless when all release tasks for the version are finished. And old releases should be archived. We can remove source artifacts for RC and archive old releases by the following command line:
dev/release/post-07-remove-old-artifacts.sh