This page is work in progress!
Flink Project Management consists of the following tasks:
- Managing the feature development for the release
- Maintenance of CI builds and infrastructure (for
master
and the two most-recently release Flink versions) - Jira maintenance
These tasks are not necessarily exclusively done by the release manager. The community should take care of it. The responsibility of the release manager is to make sure that a certain quality and stability of the relevant code base is achieved during the release.
Additionally, any release-related documentation should be kept up-to-date (e.g. Release Management and Feature Plan or Creating a Flink Release)
Organization of the Release Cycle
Preparing a Release Cycle
- Setting up a release page (see 1.17 Release as a template)
- Announcing the plan for the release cycle (e.g. feature freeze date)
Regular Sync
It's a good habit to meet on a regular basis to sync on the developments of the current release cycle (so far, bi-weekly before the feature freeze and weekly from the feature freeze until the release actually happened). A summary should be kept in the release's wiki article (see Release Management and Feature Plan) and sent to the dev mailing list to keep the community up-to-date.
Inviting the release managers of the previous release in one of the early release syncs to discuss learnings from previous efforts also worked well in the past.
Feature Freeze
The feature freeze is a set date until which features can be added to master
. After the feature freeze, no additional feature are allowed to be merged into master
. Only bugfixes and documentation changes are allowed. The goal is to stabilize master
before cutting of the release branch. The feature freeze date is communicated at the beginning of a release cycle. It's not uncommon that the date will be changed during the release cycle if there are valid reasons to do so. Such a decision needs to be discussed in the dev mailing list (see 1.17 feature freeze extension discussion).
The time between announcing the feature freeze and cutting the release branch should be as short as possible since it's blocking work that should go into future releases.
Release Testing
Release testing happens after the release branch is cut and CI is stable enough. The goal is to test features manually that ended up in this build. Additionally, the documentation for these feature should be available. Any blocking issues that come up during the release testing need to be addressed before going forward with the release.
Release Metrics
- Count contributors, the following git commands can be used to count contributors for given commit range of current branch:
git shortlog --summary startCommitId..endCommitId | awk -F ' ' '{$1=""; print $0 }'|sort -n|awk 'BEGIN{ORS=", "}{print $0}'
- Count resolved issues, the JIRA filter can be used to count the resolved issues in this version:
project = flink AND status in (closed, resolved, Fixed, Completed, Done) AND fixVersion in (1.17.0)
Maintenance of CI builds and infrastructure
The release manager should ensure stability of master
and the two most-recently published Flink versions in terms of CI. Builds can be monitored on AzureCI Flink build overview (see Testing Infrastructure for further details on the build process).
Monitoring CI failures in master and the release branches
Failed builds are reported to Apache Flink's #builds Slack channel. Build failures should be investigated and documented in this Slack channel (i.e. linking the corresponding ticket in the Slack thread and marking the thread with a "check" emoj when the investigation is done for this build). The documentation in the Slack channel allows us to work concurrently on CI failures (i.e. a missing check mark for a build failure means that the build was not fully investigated, yet, and should be picked up).
There is an issue with clicking Azure Pipeline links that are reported in the Slack channel. You need to install a redirect routine in your browser to make this work. The instructions can be found in the #builds channels canvas.
Other tasks:
- Monitoring the remote branches. Sometimes, there are remote branches created accidentally in the Apache Flink repo. Branches should generally been created in the forks. We might want to reach out to contributors to delete these accidentally created remote branches. The following branches shouldn't be touched:
master
&release-*
- Flink versioning branchesblink
- Branch holding the legacy blink code. This one is kept for historical purposes.experiment_gha_docs
&exp_github_actions
- These branches are kept as part of the Github Actions migration efforts (see Chesnay Schepler's comment in the related ML post).dependabot/*
- These branches are temporarily created by dependabot for version bumps (related ML announcement).
Relevant Repositories, Workflows and other artifacts
- apache/flink
- Main repository for Apache Flink
- Azure Pipelines (YAML configuration: apache/flink:tools/azure-pipelines/build-apache-repo.yml)
- Nightly builds
- PR builds
- Azure Pipelines can be used for forks as well (but require a special setup; see wiki). The YAML configuration for this CI is apache/flink:azure-pipelines. This is approach is deprecated because the free Azure Pipelines offer for OpenSource projects is harder to get. Use the GitHub Actions workflow, instead.
- GitHub Actions (YAML configuration under apache/flink:.github)
- Currently in beta (FLIP-396)
- Support added for 1.18+ (i.e. Azure Pipelines can be only deprecated after deprecating 1.17)
- - FLINK-33901Getting issue details... STATUS collects the subtasks to finalize the GHA migration.
- Nightly builds (see FLINK-33901 for missing pieces like nightly artifact upload and Slack integration)
- Pushes to master or the release branches (the same workflow also runs on forks for pushes)
- Nightly docs builds (could be merged into nightly builds; - FLINK-34045Getting issue details... STATUS )
- apache/flink-shaded
- Used for shaded dependencies of Apache Flink
- Separate release cycle (), i.e.
- Do version bumps in
apache/flink-shaded
- Release apache/flink-shaded
- Upgrade flink-shaded dependencies in
apache/flink
- Do version bumps in
- Especially for the netty dependencies it is adviced to do version upgrades early on in a Flink release cycle to allow for more CI runs to verify the change
- apache/flink-docker
- Flink Docker images that are pushed to the Apache Docker Hub registry for each release
- Nightly versions for
master
and therelease-*
branches are pushed to GitHub Container Registry and used in Flink's nightly builds
- apache/flink-web
- Repository for flink.apache.org
asf-site
is the repo's main branch (deployment happens automatically through Apache Infra)- Website needs to be manually build with every change (should be done through GitHub Actions similarly to what is done for the Flink docs in apache/flink:.github/workflows/docs.yml)
- apache/flink-connector-shared-utils
- Common code for all Flink connectors
- Organized in branches (see apache/flink-connector-shared-utils:README.md for further details on the structure)
apache/flink-connector*
, e.g. apache/flink-connector-kafka, apache/flink-connector-jdbc, ...- flink-ci Github organization:
- Ververica-owned organization with CI-related code
- See Continuous Integration#Repositories for further details on the different repos
- flink-ci/flink-ci-docker (The process below is not ideal and should change to use some apache-owned repo;
-
FLINK-34695Getting issue details...
STATUS
)
- Docker container that is used in different CI workflows (Azure Pipelines and GitHub Actions)
- Repository is not really used right now (due to ownership issues, slow response)
- Instead, changes were pushed to zentol/flink-ci-docker
- The docker containers where pushed to "private" Docker (e.g. rmetzger/flink-ci, chesnay/flink-ci, mapohl/flink-ci) registries in the past.
Performance Regression Tests
Performance regression tests are used to monitor that there are no changes that reduce the performance of Flink. There is more documentation on this topic in Codespeed / Benchmarks. Regressions are reported in Apache Flink's #flink-dev-benchmarks Slack channel.
Jira maintenance
- Build failures should be reported in the corresponding Jira issue (or a Jira issue should be created if none exists, yet). Contributors should be pinged to fix instabilities as soon as possible to ensure a stable infrastructure of the course of the release cycle. More details on Jira issues can be found on the Flink Jira Process wiki page.
- Newly created Jira issue should follow the Flink Jira Process guide (e.g.
fixVersion
,affectedVersion
,component
and have the labeltest-stability
) - Important information to improve Jira issue search:
- Name of the test that failed
- Link to the test failure (ideally with the relevant log line; both Azure Pipelines and GitHub Actions support log line-specific links)
- Log snippet that identified the test failure (e.g. assertion error or stacktrace)
Hints around AzureCI/Jira usage
- Console log output can be linked per log line:
- GitHub Actions: Click the line number on the left side of the console view to generate the line-specific URL.
- Azure Pipelines: The link button will appear at the end (i.e. the right side) of the log line when hovering over the log line.
- There are several URLs with placeholder (i.e.
%s
) that might be handy when accessing Jira through your browser using Firefox's bookmark keywords or Chrome's search engine feature:- Jira issue look up by ID (e.g. "<keyword> 123 would lead to
-
FLINK-123Getting issue details...
STATUS
):
https://issues.apache.org/jira/browse/FLINK-%s
- Search for open or closed Jira issues with a substring (this is handy to find test stability issues):
https://issues.apache.org/jira/browse/FLINK-0?filter=-1&jql=project%20%3D%20FLINK%20AND%20text%20~%20%22%s%22%20ORDER%20BY%20id%20DESC
- Same as above only for opened issues:
https://issues.apache.org/jira/browse/FLINK-0?filter=-1&jql=project%20%3D%20FLINK%20AND%20text%20~%20%22%s%22%20AND%20status%20NOT%20IN%20(Closed%2C%20Resolved)
- Look for most-recently updated test-stability Jira issues by date (number reflects the date range since today):
https://issues.apache.org/jira/browse/FLINK-0?jql=project%20%3D%20FLINK%20AND%20issuetype%20%3D%20Bug%20AND%20labels%20%3D%20test-stability%20%20AND%20status%20NOT%20IN%20(Closed%2C%20Resolved)%20%20AND%20updatedDate%20%3E%20startOfDay(-%sd)%20ORDER%20BY%20updatedDate%20DESC
- Jira issue look up by ID (e.g. "<keyword> 123 would lead to
-
FLINK-123Getting issue details...
STATUS
):