The Flink community wants to use Azure Pipelines for testing Flink pull requests and changes.
This page is first going to list everything we know about the current setup, then we'll describe plans moving forward.
Once we use Azure Pipelines as our primary testing system, this page will be revised.
Current Setup (December 2019)
Overview of Flink Tests
Testing workloads:
- "mvn clean verify" – all unit and integration tests, including Java-based end to end tests (3.5hours)
- variations
- hadoop=2.8.3 scala=2.11
- hadoop=2.4.1
- scala=2.12
- hadoop=2.8.3 scala=2.11 jdk=11
- variations
- flink-end-to-end-tests/run-nightly-tests.sh – all bash-based end to end tests (1 hour)
- variations
- hadoop=2.8
- scala=2.12
- hadoop=none
- jdk=11
- variations
- flink-end-to-end-tests/run-pre-commit-tests.sh – some lightweight bash-based end to end tests (10m)
- flink-python/dev/lint-python.sh – python tests
- tools/travis/docs.sh – documentation links check
Current test-execution approach on Travis CI
- on each pull request or push to master/release-*
- (variation: hadoop=2.8.3 scala=2.11)
- 1. compile-stage
- 2. test-stage
- core
- python
- libraries
- blink_planner
- connectors
- kafka/gelly
- tests
- legacy_scheduler_core
- legacy_scheduler_tests
- misc
- daily cron: master
- "mvn clean verify" in all 4 variations, split into the same jobs as regular pull request builds
- "run-nightly-tests.sh" in all 4 variations, split into 7 different jobs
- weekly cron: release-*
→ same as daily cron.
Azure Pipelines
Flink Testing Infrastructure on Azure Pipelines
Notes on our Azure Pipelines setup:
- Flink's Azure Pipelines account is located here: https://dev.azure.com/rmetzger/Flink/_build
- For the Flink repository, the testing does not happen directly on the machine, but in a docker container, providing the correct maven version (3.2.5) and the required dependencies for the openssl tests. This allows us maintain a consistent build environment across different build machines.
- The docker containers are hosted here: https://hub.docker.com/r/rmetzger/flink-ci
- Source code of the containers: https://github.com/rmetzger/flink-ci (builds are automated with GitHub Actions)
- Azure Pipeline builds are controlled via a configuration file, called
azure-pipelines.yml
and located in the Flink source code. - Secrets are defined in the AZP web ui
- Cron jobs are defined in the azure-pipelines.yml file.
Available Custom Build Machines
#machines | Name | Specs | Notes | Provider |
---|---|---|---|---|
4 | CommunityCITest0** | 32 cores, 64gb mem, x86-64, 100mbit conn | Each machine runs two agents, for better resource utilization. More machines available if needed. | Alibaba Cloud |
2 | flink-arm-agent-**** | 16 cores, 32gb mem, aarm64 | Running an outdated build agent, because aarm64 support is not Not tested yet. | https://openlabtesting.org/ |
Please reach out to the dev@ mailing list if you want to offer the Flink community more build capacity.
When to use which machines:
Flink Repository | Flink Pull requests | 3rd party repositories (Contributors) | |
---|---|---|---|
Build Environment | Custom Machines | Custom Machines | Azure Pipelines free tier for open source |
Integration with GitHub / Flink's repositories
Apache does not allow the use of the GitHub integration of Azure Pipelines, because it needs write access to the source code ( - INFRA-17030Getting issue details... STATUS ). To work around this limitation, we will do the following:
- all branches of "apache/flink" are mirrored to the "flink-ci/flink" repository using a custom script, running every few minutes. Azure Pipelines "listens" on pushes to "flink-ci/flink".
- ci-bot mirrors all pull requests against "apache/flink" into a branch, where AZP is picking up the builds.
- ci-bot is picking up the build status from GitHub for each pull request branch, reporting it back to the user via "flinkbot's" build comment.
AZP Setup: Version 1
The goal of version 1 is to migrate the existing setup from Travis to Azure Pipelines, to give the community more build capacity and stability.
Current work in progress code: https://github.com/rmetzger/flink/tree/azure_playground
Current work in progress builds: https://dev.azure.com/rmetzger/Flink/_build?definitionId=1&_a=summary
AZP Setup: Version 2
We should revisit the V1 build setup, given the better availability of build resources, and fewer constraints for free builds on AZP.
Idea 1: Trigger nightly e2e, jdk11, scala212, ... tests through "flinkbot", reporting the status back in the PR
Benefit: Pull requests with the potential of breaking e2e tests etc. can be executed on demand
Idea 2: Consider running all tests in one job (instead of splitting compile + 10 2nd stage jobs).
This is possible now again, because free builds on AZP give us a timeout of 6 hours per job. Tests would currently finish after 3.5hours.
We could also consider running different configurations (jdk11, scala212, ...) in one job each, as well as the e2e tests.
Other big data processing systems also seem to be willing to accept longer build times. A Kafka build is 2 hours, a Spark build 4:40h.
The benefit here is that it would drastically reduce the complexity of our build system
Idea 3: Parallelize our build automatically
The current approach of manually splitting the builds leads to a fairly complicated and hard-to-maintain set of scripts. If we want to have fast builds, and low complexity, we need to look into ways of automatically splitting our builds.
AZP Setup: General Considerations
Things we should consider moving forward:
- Setting up some metrics for Flink / Flink's build system
- machine utilization
- test durations
- Setting up a central log collection service, where we dump all files, indexed by a search engine. This is good for finding logs for very rate errors, or statistics on certain error frequencies.
- Check out related notes: WIP: Notes on Build Time (in particular about moving away from Maven to Gradle)
- Running the end to end tests on Windows
(deprecated) Tutorial: Setting up Azure Pipelines for a fork of the Flink repository
Deprecation Note
You can utilize Flink's GitHub Actions workflows on your fork to run CI. No need to use the flink-mirror repo anymore.
This tutorial assumes that you want to authenticate with Microsoft through your GitHub Account. You can also use your existing MSFT Account to open an Azure Pipelines Account
- Go to https://azure.microsoft.com/en-us/services/devops/pipelines/, click "Start Pipelines Free with GitHub", authenticate with your GitHub Account
- Create Microsoft Account / Connect with existing one. Go through the legal stuff
- For the new organization, go to Settings / Policies and switch on the "Allow public projects" option
- In Azure Pipelines create a new organization and add a public project:
- On the left hand side, click "Pipelines", then "New pipeline":
- Select "GitHub" as the code location & authorize it
- Select your fork of the "flink" repository & authorize it
- The setup tool should detect an existing azure-pipelines.yml file (assuming that file exists on your master branch - if not, update your master branch), press "Run" in the top right corner to run it for the first time
Running end to end tests
Our AZP setup supports running the end to end tests on the CI system, through a build time variable (MODE=e2e
).
To set this variable, you need to trigger a build yourself.
- Click the "Run pipeline" button on the top right
- Select your branch / commit, and click on "Variables":
- then click "Add Variable" add the bottom, fill the name with "MODE", and the value with "e2e". Click "Create" to set the variable, then go back to the "Run pipeline" screen and trigger the build.
- You should now see a build where only the "e2e_ci_build" is running
Azure Pipeline Usage Restrictions
Azure Pipelines has restricted usage of free plan accounts to a case-by-case basis, making the setup of Flink's CI on a personal fork difficult. While waiting for Azure to approve your personal, free account, there's a workaround one can use without creating noise on the main apache/flink
GitHub repository. Please use this approach responsibly (don't submit too many builds), and only while waiting for Azure to approve your account. Note that they won't reply to your email. Your personal CI will just start working.
To do so, you'll need to fork the flink-ci/flink-mirror
repository to your personal GitHub account.
Then, you'll need to add the mirror as a git remote in your local machine's Flink project directory.
cd /path/to/your/flink git remote add flink-mirror https://github.com/<your-user-name>/flink-mirror.git
To run your changes on Flink's Azure CI:
- push your branch to your
flink-mirror
fork - open a draft PR in the
flink-ci/flink-mirror
GitHub repository
git push -u flink-mirror your-branch
When you're satisfied with your changes, you can:
- close the draft PR in the
flink-ci/flink-mirror
repository - push your branch to your
apache/flink
fork - open a new PR against the
apache/flink
GitHub repository
git push -u origin your-branch