Problem: Flink's build is too slow

We want to reduce the local and CI build times of Flink. This page is looking at options.

§1 Optimize current setup

We currently use Maven + Travis CI + custom scripts. This proposals keep this setup but refine it

Enable JVM reuse for IT cases in more modules (Solution 1)


  • Speedups in blink planner (7 minutes saved)
  • Considered easy to implement


  • Not all tests are doing a proper cleanup

Custom differential build scripts (Solution 2)


  • Only build & test affected modules


  • Needs a defensive/pessimistic design to catch all potential issues
  • development and maintenance of "homegrown" scripts working around Maven limitations
  • Reinventing the wheel to compensate for the limitations of a bad build tool (Maven)
  • Complex, non-standard build system

Only run smoke tests when PR is opened, run heavy tests on demand (Solution 3)


  • Execute fewer tests, heavy tests on demand


  • Custom implementation with "ci-bot" likely
  • Committers need to know which test runs to request / run

Move more tests into cron builds (Solution 4)


  • almost no custom implementation needed (cheap version of 'Solution 3')


  • Poor developer experience: People expect to get fast feedback on their changes
  • Failures in cron builds potentially go unnoticed for quite some time (months)
  • Potential of lower long-term build quality

Work towards parallelizing the build better


  • Moving to a build infrastructure with more CPU cores will allow us to run more build / test workloads concurrently


  • Maven checkstyle plugin
  • Kafka tests (30 minutes of sequential execution)

Use Gradle Enterprise Global Build Cache

Gradle Enterprise provides a maven plugin for global build caches.


  • Incremental build benefits on a module basis (I guess)
  • Low effort, because it is used in the existing environment
  • Save money on Travis plan (faster builds)
  • Improves local and CI builds


  • Relies on a proprietary product
  • Unclear if it works for anonymous Flink contributors

§2 Switch Build System

We currently use Maven + custom scripts

Use Gradle (Solution 5)


  • Supports incremental builds and tests
  • Supports remote build cache to do an incremental build w/o having earlier increments (through "Gradle Enterprise")
  • All build tasks can be solved in code, instead of Maven+scripts


  • MAJOR effort to change entire build system
  • All Flink developers need to learn a new build system


  • Apache Kafka is using gradle
  • Apache Beam migrated from Maven to grade by having both build systems side-by-side during the transition
  • gradle supports Kotlin (as an alternative to Groovy) for the build scripts, but Kotlin support is new and has potential limitations
  • Arvid Heise is willing to support a POC
    • ~1 week for PoC (some modules only, not all problems solved)
    • POC must cover CI as well
  • Problems to solve
    • Shading & layered shading
    • Inclusion of NOTICE files into the final build (producing valid Apache releases in general)
    • Support for mixed scala / java projects
    • Javadocs for mixed scala / java projects
    • Java 9+ support
    • API compatibility checks
    • checkstyle
    • ensuring dependency convergence
  •  unclear whether we can use Gradle Enterprise build cache for free as open source, and how it works over the public internet (in a secure way)

Use Bazel


  • Supports incremental builds


  • MAJOR effort to change entire build system
  • Not widely adopted in Javaland


  • A quick search for shading with bazel didn't reveal promising results

§3 Switch Build Infrastructure

We currently use Travis CI

Benefits of moving away from Travis:

  • Travis future is uncertain due to company ownership changes
  • Travis build caches are unreliable / used in a hacky way
  • Travis only provides a build environment with 2cpu, 7.5g (where a build currently needs 3.5hrs). Other vendors provide bigger build instances, where the build can finish in ~1.3hrs
    • Travis provides bigger build environments in paid plans.

Move to another hosted CI service (Solution 6)


  • Low maintenance overhead of a hosted service
  • similar experience to current setup


  • Hosted CI services often have resource limited build environments

Free for open source options:

  • Azure Pipelines (recommended by community)
  • GitHub CI
    • Closed Beta
    • Seems to be based on AZ Pipelines
  • Circle CI

Paid options:

  • Google Cloud Build
    • 32 core builders (at a high price tag (almost 4x over the compute instances' price)) 

Move to a self-hosted CI service


  • Lower costs compared to hosted CI service


  • How do we support building private branches from outside contributors?
  • Somebody needs to maintain the infrastructure to provide a similar experience

Options for software:

Options for machines:

  • Cloud providers
    • Google: $1500/mo for 2x 32core machines
  • Dedicated Servers

§4 Split Repository (TODO)

See separate page (wip)

  • No labels