How to update dependency versions?
- We expect that the majority of upgrades will be done via merging of automated Dependabot PRs.
- When reviewing a dependency update PR, note that increasing an upper bound may not be sufficient to exercise the new version in integration test suites. Committers can temporarily include a commit to a PR that increases the lower for a dependency, run integration tests, once they pass, remove the lower bound constraint. Sample PR that followed this process: https://github.com/apache/beam/pull/25786/commits. See an example of a PR review that followed this process, and identified a breaking change: https://github.com/apache/beam/pull/17615.
- Prefer to make risky dependency updates early in the release cycle, for example after a release cut, to have more time to test and identify issues before next release.
- When a newer version of a dependency is installed in job submission, and an older version of a dependency is installed in Beam container images, a misconfiguration can occur when Beam Job graph captures some of the dependency functions in serialized DoFn code: https://github.com/apache/beam/issues/33639 . To reflect the update in Beam containers, you may need to update dependency versions used in Beam container images. There is a workflow that updates the dependency versions once per release cycle, shortly after a release is cut. If you merge dependency upgrade PR after the script ran, you may need to run a gradle command or manually update the versions in requirements files. Such changes should be also done early in the release cycle.
When to update the versions?
- Depending on old versions of our dependencies is an inconvenience to users and can be a ticking time bomb (https://s.apache.org/beam-python-dependencies-pm).
- Be proactive: update early and escalate any issues downstream as early as possible. Most dependabot PRs should be merged within a week. Complex upgrades like supporting a new major version of a commonly used library (for example protobuf), and may need to be completed across the ecosystem of packages.
- For upgrades that require a significant amount of work, Beam maintainers should plan to complete the upgrade within a year after the next (major) version has been first released. The sooner, the better.
How to add a new dependency?
- Set the lower bound to some version you tested. Often, it's the latest available version for the package.
- For libraries that claim to follow semantic versioning, cap the upper bound at the next major version. For example: "some_package>=1.4.0, some_package<2"
- Depending on an exact version or a very narrow range is warranted only in exceptional cases, for example: pickling libraries ()
- Using less-or-equal sign in upper bound is wrong (e.g. "some_pacakge<=2.0.0"): this caps the upper bound to a specific version (2.0), excluding a possibility to make a patch release that the constraint will pick up.
- For stable dependencies (only bugfix releases, stable api surface), open version bounds are acceptable. Example: 'pytz>=2018.3'.
- For other dependencies, use upper bounds at the next minor version or decide case by case. Example: 'numpy>=1.14.3,<1.25.0', https://github.com/apache/beam/blob/818c2b44e998529f3e5727a5d30b75922e0d113d/sdks/python/setup.py#L248-L250
- When a rationale behind a requirement spec is not obvious, explain in a comment.
Should transitive dependencies be included?
- Don't manage more than necessary. Do not add constraints on transitive dependencies that are not direct dependencies.
- If Beam directly uses a transitive dependency, Beam should also directly depend on it (include it in constraints).
- If a transitive dependency causes issues, add it to our requirements with an appropriate upper bound and comment when such a requirement can be removed.
What to do when installing Beam causes backtracking?
- Installing Apache Beam should not require backtracking during dependency resolution: after pip evaluates the set of constraints, each package should be downloaded once.
- This is generally possible when using the latest allowed version of each dependency leads to a compatible configuration.
- When backtracking happens, it can be prevented by adding a constraint that caps the allowed version of a dependency to the last compatible version.
How to find which dependencies are outdated?
- Update Beam's base image requirements (recommended).
- Install Beam dependencies into a clean environment: `pip install -r sdks/python/container/py310/base_image_requirements.txt`
- Check for outdated dependencies: `pip list --outdated`.
- Ideally, each outdated dependency from our direct dependency list should either have a Dependabot PR in flight or an issue tracking the upgrade.
{"serverDuration": 87, "requestCorrelationId": "397f79c2cfa952cf"}