This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Skip to end of metadata
Go to start of metadata



Status

StateAccepted
Discussion  ThreadMutli-layered official image for Airflow
JIRA

AIRFLOW-3718 - Getting issue details... STATUS

Created

2019-01-16


Motivation

Current official Airflow image is rebuilt from the scratch every time new commit is done to the repo. It is a "mono-layered" one and does not use Docker's multi-layer architecture nor multi-stage Docker architecture.

Mono-layered image means that builds after only small changes take as long as full build rather than utilise caching and only rebuild what's needed.

With multi-layered approach and caching enabled in Docker Hub we can optimise it to download only the layers that changed. This enables the users using the images to download only incremental changes, and opens up a number of options how such incremental build/download process can be utilised:

  • Multi-layered images can be used as based for AIP-7 Simplified development workflow - where locally downloaded images are used during development and they are incrementally updated quickly during development with newly added dependencies.
  • Multi-layered images being part of the "airflow" project can be used to run Travis CI integration tests (simplifying the idea described in Optimizing Docker Image Workflow ). Having incremental builds will allow DockerHub registry to be used as source for base images (pulled before build) to build locally final image used for test execution in an incremental way.
  • Why initially the images are not meant to be used in production, using multi-staging, variable arguments and multiple layers to produce production-ready Airflow image that can be used to pre-bake Dags into the image - thus making Airflow closer to be Kubernetes-native. This has been discussed as potential future improvement in  AIP-12 Persist DAG into DB
  • Ideally both Airflow and CI images should be maintained in single place - single "source of truth" to ease maintenance and development. Currently they are maintained in separate repositories and have potentially different dependencies and build process. It also makes it difficult to add your own dependencies during development as there is no regular/development friendly process to update CI image with new dependencies. 
  • Currently the CI builds are reinstalling PIP dependencies every time they are run. Since Airflow does not use pinned dependencies, sometimes transitive dependencies change and cause the CI builds to fail for no apparent reason. The idea is to make sure that the same set of dependencies are used for CI builds until setup.py changes and dependencies got reinstalled.
  • At the same time we should be able to verify that the "clean" build still works - there should be a way to find out that clean install still produces good results.

What problem does it solve?

It solves the following problems with current Dockerfile and image:

  • time needed to rebuild the image when anything changes is rather big - no matter what kind of change it is, the current mono-layered repo always takes about 10 minutes on a typical development machine to rebuild
  • this long rebuild time is a blocker to a number of use cases (including building the image during CI, building image for Kubernetes in the future, building the image for local development environment, building the image locally for reproducing CI test environment]
  • it's unclear what's the purpose of the current Dockerfile and docker image. It's neither used for CI, nor ready to be used for production use. It's also pretty unusable for regular pulls because effectively each such image is fully rebuilt from scratch and downloaded in full
  • the current Dockerfile/image does not seem to be used by anyone - it was failing in DockerHub for 2 weeks March 1 - March 14 due to a bug and nobody noticed. The proposed image is build incrementally in CI and failures will be noticed.
  • stability of CI builds → currently sometimes transitive dependency changes cause CI tests to fail. Unless setup.py changes, the tests in CI should be done using stable set of dependencies (frozen at the moment of the last build). 
  • Only when setup.py changes, the dependencies should be reinstalled and transitive dependencies got updated (but then it should happen in a branch rather than master to detect it before other branches start to fail)

Why is this change needed?

  • Enabler for  AIP-7 Simplified development workflow
  • Improved and simplified CI process 
  • CI process should be able to be easily reproduced locally (including failing tests)
  • We can use the image during running system tests for external systems as described in AIP-4 Support for System Tests for external systems
  • TOX adds unnecessary complexity and should be removed
  • CI process is not always reproducible with transitive dependencies causing installation problems

Suggested implementation

General architecture


In the PR : https://github.com/apache/airflow/pull/4543 the current mono-layered docker has been rewritten as multi-layered one. 

The PR uses "hooks/build" hook that is used by DockerHub build process to control caching and build process.

Thanks to that we can build different variants of the images (Airflow image (slim), CI image (fat) with more dependencies.

Life of an image


Assumptions

  • There are two images to be built:
    • "Airflow" image - slim image with only necessary Airflow dependencies
    • "CI" image - fat image with additional dependencies necessary for CI tests
  • there are separate images for each python version (currently 2.7, 3.5, 3.6)
  • each image uses python-x.y-slim as a base
  • all stages are defined in single multi-stage Dockerfile
  • Standard Docker build: it's possible to build main airflow image by issuing "docker build ." command. It's not optimised for DockerHub cache reuse but it will build locally.
  • Scripted Docker build: we are using hook/build script to build the image utilising DockerHub cache - pulling the images from registry and using them as cache. Those are mainly useful for local development 
  • binary/apt dependencies are build as separate stages - so that we can use whole cached images with main/CI dependencies as cache source
  • the builds are versioned - airflow 2.0.0.dev0 images are different than airflow 2.0.1dev0
  • we should be able to run the build without Docker cache - reinstalling everything from the scratch

Terms

  • Multi-staging Dockerfile - a Dockerfile that utilises multi-stage builds - allowing to have intermediate images that are used to build final image. Useful for code reuse between different variants of the same image
  • Base python image - this is the base image that Airflow image starts from. We are using Python-x.y-slim image as base image.
  • Airflow apt dependencies - binary dependencies installed with 'apt-get' package manager (Debian based systems). Minimal dependencies that are needed to run Airflow
  • CI Airflow apt dependencies - binary dependencies installed with 'apt-get' package manager (Debian based systems). Dependencies that are needed by Airflow to run CI test suite (communicating with external images such as Mysql/Postgres/Hadoop/....)
  • Docker compose configuration - configuration of interacting images needed to run  CI tests by Airflow - describing dependencies and versions of the images needed. Docker-compose orchestrates starting the whole environment using all the images.'
  • PIP dependencies - Python dependencies required by Airflow to start. They are described in setup.py configuration file (and related files such as version.py). There are different variants of dependencies for different purposes that you can specifiy by requesting "extras" (such as "ci_all", "devel_all" etc.)
  • Wheel cache - optional, pre-compiled wheels for PIP packages. Such pre-compiled wheel packages can be used to trade-off the need to download, compile and install packages into possibility if installing them from locally stored wheels. It is supposed to speed up installation especially when the whole PIP install layer gets invalidated and we need to reinstall all PIP packages from the scratch. Wheel cache should not be rebuild during CI Docker build to save time - rather than than cache built in previous build should be used. For now we abandoned the idea of using wheel cache as it seems to increase complexity and brings marginal performance improvements. 
  • NPM dependencies - Javascript/NPM dependencies required by Airflow's webserver. They are installed locally and inside container image (using 'npm ci/npm install' methods). They are described in package.json and package-lock.json (in airflow/www/ directory). The package.json describes general requirements that are used by 'npm install' and package-lock.json describes "locked" dependencies that are used by 'npm ci'. Installed modules are stored in "node_modules" dir in www directory.
  • "Pre-processed" web resources - Javascript sources of the Webserver are processed by 'npm run prod' and prepared for "production" deployment - including minifying of the sources etc. The results are stored in "airflow/www/static/dist" directory.
  • www sources - placed in airflow/www - those are sources of the webserver that require pre-processing as described above.
  • Airflow sources - airflow sources are mainly python files, therefore they do not need to be compiled. Usually in local environment they are mounted directly to where they are supposed to be in the image, so usually you do not need to rebuild the image to get the latest sources.
  • Airflow Image - slim image with only apt dependencies and "all" PIP extras
  • CI Airflow Image - fat image with apt + CI apt dependencies and "devel_ci" PIP extras
  • Airflow Breeze - proposed, simplified development workflow environment that makes it easy to run Airflow locally for testing, simulates CI environment for testing and allows to manage lifecycle of the images of Airflow. It is proposed as part of the AIP-7 Simplified development workflow

Changes that trigger rebuilds

Those changes below are described starting from the most frequent ones - so staring backwards from the end of Dockerfile, going up to the beginning.

  • Airflow apt dependencies are "upgraded" as last part of the build (after sources are added) - thus upgrade to latest versions available is triggered every time sources change (utilising cache from previous installations).
  • Airflow source changes do not invalidate previously installed packages from apt/pip/npm. They only trigger upgrades to apt packages as explained above.
  • changing to www sources trigger pre-processing of the web page for production (npm run prod) and everything above.
  • changing NPM Dependencies (package.json or package-lock.json) trigger reinstallation of all npm packages (npm ci) and everything above.
  • changing any of PIP dependencies (setup.py-related files) trigger reinstallation of all pip packages (pip install -e .[<EXTRAS>]) and everything above. Optionally wheel cache might be used (see below)
  • changing the wheel cache (from previous build) causes rebuild of everything above
  • for CI build, changing CI Airflow apt dependencies triggers reinstallation of those dependencies and everything above
  • changing Airflow apt dependencies triggers reinstallation of those dependencies and everything above
  • there is a possibility to trigger whole build process by changing one line in Dockerfile (FORCE_REINSTALL_ALL_DEPENDENCIES)
  • changing Dockerfile itself triggers rebuild of the image (at the line it was changed).
  • new python stable image (python-x.y-slim) triggers rebuild of the whole image. The latest stable image is always pulled when building the image in CI environment.

Observations during implementation of POC

During implementation of POC for the multi-stage/multi-layered build, several different iterations have been performed. During the trials also POC for related AIP-7 Simplified development workflow was implemented because it had an impact on timing of builds especially for the local development workflow, where images are built as needed locally. The following observations have been made (see below for detailed discussion of variants considered).

  • Both "mono-layered" and "multi-layered" build time can be significantly decreased by implementing CASSANDRA fix. Cassandra driver installed during PIP install will take around 5 minutes to install on its own because it compiles an optimal cython version of the driver. This (for non-production purposes) can be sped up significantly (to a few seconds) by setting  CASS_DRIVER_BUILD_CONCURRENCY and CASS_DRIVER_NO_CYTHON variables. Such image will not be good for production but it should be more than enough for CI testing.
  • Multi-layered architecture is rather helpful in bringing down the time added extra for CI builds. It decreases time needed to build image to 3 minutes in most cases, with more than 5 minutes always needed for Mono-layered image (TBC - will probably go further down with Wheel cache removal) 
  • Docker local builds and checking cache validation by Docker takes significant time. Much more efficient techniques is that tested in Airflow Breeze where triggering Docker build is done only after checking whether important files changed (using md5sum). It's effective and efficient (less than 2 seconds to perform the check and enter environment in case the files are not changed)
  • Wheel cache seems to cause more problems than it solves.. It works well for the case where new requirements are added (decreasing time and bandwidth needed to install packages) but building the cache takes time, it also adds significant complexity to the whole build. We are trading time needed to download the cache with time needed to use wheel cache layers and size of the disk needed to store the wheels. Overall the savings do not seems to justify increased complexity and the proposal is to not use wheel cache in the official image.

Implementation proposal

Two variants of the implementation are proposed and described in detail below:

  • No wheel cache variant
  • Wheel cache variant

Those are characteristics of the image:

  • Multi-staging Docker image
  • Can be build with 'docker build .'
  • Arguments controlling the image generated can be passed with --build-arg parameters
  • Depending on whether Airflow or Airflow CI image is used, either airflow-apt-deps, or airflow-ci-apt-deps intermediate image is used to build the final one
  • Until we have BuildKit support in Docker we cannot build airflow-ci-apt-deps layer conditionally. Therefore there is a workaround to build the image quickly for Airflow image - with "conditional" run 

Image size comparison


SizeExtras

Airflow mono-layer image

1.2 GBall
Airflow multi-layer1.2 GBall
CI Airflow multi-layer2.5 GBdevel_ci

Proposed  implementation sequence/ schedule

The proposed sequence/schedule of the implementation is as follows (as proposed in  https://github.com/apache/airflow/pull/4543#issuecomment-474049231):

  • Step 1: AIRFLOW-4115 JIRA, PR - Docker file for Main airflow image is multi-staging and has multiple layers

After merging it, the Multi-stage Dockerfile replaces the original mono-layered Dockerfile. It only implements main Airflow image and continues to build single variant (python 3.6). No functional changes. 

  • Step 2: AIRFLOW-4116 JIRA, PR - Support for Main/CI images in single Dockerfile

After merging it, the Multi-stage Dockerfile implements both main and CI images are build in DockerHub via custom build script. Tests continue to be executed using the old incubator-ci image.

  • Step 3: AIRFLOW-4117 JIRA  PR- Travis CI uses multi-stage Docker image to run tests

After merging it, Travis CI uses the multi-stage image to run tests. 

In the follow-up step AIP-7 Simplified development workflow can be merged to improve development and allow to reproduce Travis CI testing locally.

Proposed setup of the DockerHub and Travis CI

DockerHub - DockerHub builds for the WIP

  • We should choose which versions of Python should be supported. For now assumption is that Python versions: 2.7, 3.5, 3.6 should be supported (TODO : verify that)
  • The DockerHub setup should be configured to build one build for each python version. Each builds produces two images:
    • latest-X.Y-VERSION  - Airflow image
    • latest-X.Y-ci-VERSION - Airflow CI image
  • The DockerHub builds should only be build from 'master' branch. They will produce an incrementally updated image with every commit to master.
  • The builds in DockerHub can take a long time, this means that the master images will not be available for some time after master is merged, however this is not a problem as Docker will automatically invalidate and build images as needed even from older images.
  • The build scripts are implemented in the way that python version is determined from the Docker Tag name:

  • The images TAGs generated follows the scheme: latest-X.Y-VERSION, latest-X.Y-ci-VERSION as well as latest-X.Y and latest-X.Y-ci where X.Y is python version and VERSION is Airflow version (currently 2.0.0.dev0):

  • custom "hooks/build" script is implemented to control details of the build, caching and producing more than one image (Airflow + Airflow CI) with single Docker Tag auto-build configuration.

Example DockerHub builds from the current WP can be found at https://cloud.docker.com/repository/docker/potiuk/airflow/

Travis CI: Travis CI build here for the WIP

  • The setup of Travis CI remains as it is today for CI/incremental builds. It makes every PR starts a build for its own branch. 
  • The CI builds (controlled with CI environment variable) pull images from DockerHub - they pull the latest (master) images for caching and perform local build of images based on current sources
  • Those locally built images are used to run tests (same as current builds - using Docker Compose setup).
  • Matrix configuration for builds (TODO: confirm it). With this configuration we can configure much more easily what are the interesting matrix elements that we want to see in the build. The proposed configuration below has a matrix of 12 builds: 3 python versions x 5 different combination of variables - 3 excluded combinations of both - python and env that we are not interested in.

    Current matrix configuration
    env:
      matrix:
      - BACKEND=mysql ENV=docker
      - BACKEND=postgres ENV=docker
      - BACKEND=sqlite ENV=docker
      - BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.9.0
      - BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.13.0
    python:
    - '3.6'
    - '3.5'
    - '2.7'
    matrix:
      exclude:
      - python: '2.7'
        env: BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.13.0
      - python: '3.5'
        env: BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.9.0
      - python: '3.6'
        env: BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.9.0
    
    
  • Thanks to TOX  removal and simplification of the variables it is now clearer to see what kind of build each job performs (the list of jobs below is generated from the configuration above)

  • Additional cron-controlled TravisCI build will be triggered daily to perform clean image build + tests - without any Docker cache. This can be used to verify if the transitive dependencies changes are not breaking the current build.


Stages of the image

Those are the stages of the image that are defined in Dockerfile

  • X.Y - python version (2.7, 3.5 or 3.6 currently)
  • VERSION - airflow version (v2.0.0.dev0)

StageDescriptionLabels in DockerHub

Airflow build

dependencies

Airflow CI build

dependencies

1python-base-imageBase python imagepython-X.Y-slimbasebase
2ariflow-apt-depsVital Airflow apt dependencies-11
3airflow-ci-apt-depsAdditional CI image dependencies-not used2
4mainMain airflow sources build. Used for both Airflow and CI build

Airflow builds:

  • latest-X.Y (only latest version)
  • latest-X.Y-VERSION

CI builds:

  • latest-X.Y-ci (only newest version)
  • latest-X.Y-ci-VERSION
2

3


Dependencies between stages

Effectively those images we create have those dependencies. In case of Dockerfile changes, Docker multi-staging mechanism takes care about rebuilding only those stages that need to be rebuild in case of Dockerfile definition change - changes in a stage trigger rebuilds only in stages that depend on it.

Layers in the main image

The main image has a number of layers, that make the image rebuilds incrementally depending on changes in the repository vs. the previous build. Mechanism of Docker build (context/cache invalidation) are used to determine if the subsequent layers should be invalidated and rebuild. The ^^ means that previous layer change triggers the rebuild.

No.LayerDescriptionTrigger for rebuildAirflow build behaviourCI build behaviour
1PIP configurationSetup.py and related files (version.py etc.)Updated dependencies for PIPCopy setup.py related files to contextCopy setup.py related files to context
2PIP installPIP installation^^All PIP dependencies downloaded and installedPIP dependencies downloaded and installed
3NPM package configurationpackage.json and package-lock.sonUpdated dependencies for NPMCopy package files to contextCopy package files to context
4npm ciInstalls locked dependencies from NPM^^All NPM dependencies downloaded and installedAll NPM dependencies downloaded and installed
5www filesairflow/www all filesUpdated any of the www filesCopy www files to contextCopy www files to context
6npm run prodPrepares production javascript packaging for webserver^^Javascript preparedPackages prepared
7airflow sourcesCopy all sources to contextAny change in sourcesCopy sources to contextCopy sources to context
8apt-get upgradeUpgrading apt dependencies^^All apt packages upgraded to latest stable versionsAll apt packages upgraded to latest stable versions
9pip installReinstalling PIP dependencies^^Pip packages are potentially upgradedAll PIP packages are upgraded

The results of such layer structure are the following behaviours:

  • in case PIP configuration is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled. For Airflow build, all PIP packages are downloaded and installed
  • in case NPM configuration is changed: NPM packages + NPM compile + sources are reinstalled
  • in case any of WWW files changed: NPM compile + sources are reinstalled
  • in case of any source change: sources are reinstalled

Different types of builds

The images for Airflow are build for several scenarios - and the "hook/build" script with accompanying environment variable controls which images are built during those scenarios:


ScenarioTriggerPurposeCacheFrequencyPull from DockerHubPush to DockerHubImages prepared during the build
AirflowCI
DockerHub build for master branchA commit merged to "master"Build and push reference images that are used as cache for subsequent buildsFrom masterSeveral times per dayYesYesYesYes
Local developer buildTriggered by the userBuild when developer adds dependencies or downloads new code and prepares development environmentFrom local images (pulled initially) unless cache is disabledOnce per dayFirst time or when requestedWhen requested and user logged in


Yes

Google Compute Engine

Build Machine

Manual buildFirst Manual build to populate DockerHub registry faster (optional)No cacheFirst buildNoYesYesYes
CI buildA commit is pushed to any branchBuilds image that is used to execute CI tests for commits pushed by developers.From masterSeveral times an hourYesNo
Yes

Variant with wheel cache (abandoned)

This variant uses wheel cache. Wheels are pre-compiled and stored in /cache directory of specially prepared stage. This is in order to speed up installation. Normally - when you use cache invalidation techniques by Docker, change in setup.py will invalidate the whole layer where PIP install was performed. When this layer is rebuilt, all the packages have to be re-downloaded and re-installed. Using a wheel cache prepared during previous build - we can speed-up the reinstallation (at least that's the hypothesis).

Note that it requires to download the wheel cache image before it is used. This does not necessary have to be true, that building with and without wheel cache is faster. Initially when the whole build took around 10 minutes (with full Cassandra driver build) trading installation time with download time of wheel cache was definitely a good idea, but speeding up the build with NO_CYTHON changed the perspective a bit. We tested both Wheel and No-wheel image and the result have shown that wheel cache brings only marginal improvements with highly increased complexity.


 Image with wheel cache - ABANDONED ...

Stages of the image

Those are the stages of the image that we have defined in Dockerfile

  • X.Y - python version (2.7, 3.5 or 3.6 currently)
  • VERSION - airflow version (v2.0.0.dev0)

StageDescriptionLabels in DockerHub

Airflow build

dependencies

Airflow CI build

dependencies

1PythonBase python imagepython-X.Y-slim--
2ariflow-apt-depsVital Airflow apt dependencieslatest-X.Y-apt-deps-VERSION11
3airflow-ci-apt-depsAdditional CI image dependencieslatest-X.Y-ci-apt-deps-VERSION[Not used]2
4wheel-cache-masterMaster wheel cache build on DockerHub from latest master for faster PIP installslatest-X.Y-wheelcache-VERSION[Not used]3
5wheel-cacheCurrently build wheel cache (for future builds)latest-X.Y-wheelcache-VERSION[Not used]3
6mainMain airflow sources build. Used for both Airflow and CI build

Airflow builds:

  • latest-X.Y (only latest version)
  • latest-X.Y-VERSION

CI builds:

  • latest-X.Y-ci (only newest version)
  • latest-X.Y-ci-VERSION
2

3 - image

4 - /cache folder with wheels

Dependencies between stages

Effectively those images we create have those dependencies. In case of Dockerfile changes, Docker multi-staging mechanism takes care about rebuilding only those stages that need to be rebuild in case of Dockerfile definition change - changes in a stage trigger rebuilds only in stages that depend on it.


Layers in the main image

The main image has a number of layers, that make the image rebuilds incrementally depending on changes in the repository vs. the previous build. Mechanism of Docker build (context/cache invalidation) are used to determine if the subsequent layers should be invalidated and rebuild. The ^^ means that previous layer change triggers the rebuild.


LayerDescriptionTrigger for rebuildAirflow build behaviourCI build behaviour
1Wheel cache master

/cache folder with cached wheels from previous build

Rebuild of the wheelcache source.Empty wheel cache used to minimise size of the imageWheel cache build in latest DockerHub "master" image used.
2PIP configurationSetup.py and related files (version.py etc.)Updated dependencies for PIPCopy setup.py related files to contextCopy setup.py related files to context
3PIP installPIP installation^^All PIP dependencies downloaded and installedPIP dependencies installed from wheel cache - new dependencies downloaded and installed
4NPM package configurationpackage.json and package-lock.sonUpdated dependencies for NPMCopy package files to contextCopy package files to context
5npm ciInstalls locked dependencies from NPM^^All NPM dependencies downloaded and installedAll NPM dependencies downloaded and installed
6www filesairflow/www all filesUpdated any of the www filesCopy www files to contextCopy www files to context
7npm run prodPrepares production javascript packaging for webserver^^Javascript preparedPackages prepared
8airflow sourcesCopy all sources to contextAny change in sourcesCopy sources to contextCopy sources to context
9apt-get upgradeUpgrading apt dependencies^^All apt packages upgraded to latest stable versionsAll apt packages upgraded to latest stable versions
10pip installReinstalling PIP dependencies^^Pip packages are potentially upgradedAll PIP packages are upgraded

The results of such layer structure are the following behaviours:

  • in case wheel image is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled for CI build (nothing changes for Airflow build)
  • in case PIP configuration is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled. For Airflow build, all PIP packages are downloaded and installed, for CI build Wheel cache is used as base for installation (faster)
  • in case NPM configuration is changed: NPM packages + NPM compile + sources are reinstalled
  • in case any of WWW files changed: NPM compile + sources are reinstalled
  • in case of any source change: sources are reinstalled

Different types of builds

The images for Airflow are build for several scenarios - and the "hook/build" script with accompanying environment variable controls which images are built during those scenarios:


ScenarioTriggerPurposeCacheFrequencyPull from DockerHubPush to DockerHubImages prepared during the build (controled by environment variables)
Apt depsCI Apt depsMaster WheelcacheLocal wheelcacheAirflowCI
DockerHub build for master branchA commit merged to "master"Build and push reference images that are used as cache for subsequent buildsFrom masterSeveral times per dayYesYesYesYesYesYesYesYes
Local developer buildTriggered by the userBuild when developer adds dependencies or downloads new code and prepares development environmentFrom local images (pulled initially) unless cache is disabledOnce per dayFirst time or when requestedWhen requested and user logged inYesYes
Yes


Yes

Google Compute Engine

Build Machine

Manual buildFirst Manual build to populate DockerHub registry faster (optional)No cacheFirst buildNoYesYesYesYes
YesYes
CI buildA commit is pushed to any branchBuilds image that is used to execute CI tests for commits pushed by developers.From masterSeveral times an hourYesNoYesYes


Yes

Build timings

Mono-layered image build timings

Build timings for different scenarios

Those timings were measured during tests. Times are in HH:MM:SS.

The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times. 

Where builtImagesNo source changeSources changedWWW sources changed

NPM packages changed

PIP Packages changed

CI Apt deps changedApt deps changed

Full build

(from scratch)

Comments

Local Machine *

Original Mono-layer image

Airflow0:278:268:268:268:268:268:269:26

Local Machine *

Monolayer (Cassandra fix) **

Airflow0:304:344:344:344:344:344:345:56Seems that Cassandra driver speedup is a single biggest improvement we can make to Mono-layer image. However as we can see from here - it takes the same time to rebuild the image (around 5 minutes) no matter which part of the sources changed.

Google Compute Engine ***

Mono-layer
Airflow0:019:079:079:079:079:079:0710:43

* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.

** Cassandra fix - installing cassandra driver takes a lot of time - it compiles cython-based driver (which is good for performance) - Cassandra fix speeds up the build by removing cython optimisations. Multi-layer images are build with cassandra fix.

*** Google Compute Engine: custom (8 vCPUs, 31 GB memory)

No wheel cache variant build timings

Those timings were measured during tests. Times are in HH:MM:SS.

The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times. The coloured fields show use cases that are "typical" during normal development cycle. 

  • Green - local development
  • Yellow - Travis CI build
  • DockerHub build - red

Main use cases:

Where builtImagesNo source changeSources changedWWW sources changed

NPM packages changed

PIP Packages changed

CI Apt deps changedApt deps changed

First time build

(from scratch)

Commwnts

Local Machine *

with 'Airflow Breeze'

CI

0:15

0:250:401:024:149:149:278:54 (warning)Timing for typical local development. Note that build from scratch takes less than rebuild (we are pulling images first)

Travis CI build


CI2:372:372:353:116:008:508:509:20

Typical timing for CI builds. Those are delays/additional time expected in PRs that introduce the type of changes described in the table.

Note that in the current build on Travis CI it takes about 5 minutes to perform initial setup - with installing and collecting required packages.  

DockerHub


Airflow

CI

7:037:2010:0013:2029:0041:5039:2410:15(warning)

There are significant delays/queues on DockerHub. The image sometimes waits in a queue several hours before it actually starts building. Both CI and Airflow CI images are built.

Note that build from scratch takes less than rebuild (we are pulling images first)

* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.


Other build types:

Where builtImagesNo source changeSources changedWWW sources changed

NPM packages changed

PIP Packages changed

CI Apt deps changedApt deps changed

First time build

(from scratch)

Commwnts

Cloud Build **


CI1:101:101:101:304:307:187:287:20For testing - Google Cloud Build was also tested. It's not an official way of running CI but it might become an optional System for Google Cloud Platform

Local Machine *

Docker build CI Airflow image

CI0:020:130:230:484:288:128:06

8:00

Build locally Airflow CI image using 'docker build'. It's not happening usually but can be done to manually build the CI image:

docker build --build-arg APT_DEPS_IMAGE=airflow-ci-apt-deps .

Local Machine *

Docker Build Airflow image

Airflow0:020:140:260:464:375:156:106:58

Build locally Airfow image using 'docker build'. This is not usually done, but using 'docker build .' might be done at any time by developers as they are quite used to building images this way.

docker build .

Local Machine*

CI1:582:102:282:576:199:4111:0411:50

Local machine was used to simulate what happens in CI environment.

CI=true - runs /hooks/build

Google Compute Engine ***


CI

1:271:281:382:005:327:508:398:39

Cloud machine was used to simulate what happens in CI environment.

CI=true - runs /hooks/build

Google Compute Engine ***

with 'Airflow Breeze'

CI0:04


0:130:270:484:247:097:20

7:20

Same as for local development but using the cloud machine. Note that build from scratch takes less than rebuild (we are pulling images first)

Only CI build using Airflow Breeze


* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.

** Cloud Build - M8 High CPU

*** Google Compute Engine: custom (8 vCPUs, 31 GB memory)

Wheel cache variant build timings (abandoned)


 Timings for wheel cache variant ...

Those timings were measured during tests. Times are in HH:MM:SS.

The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times. The coloured fields show use cases that are "typical" during normal development cycle. 

  • Green - local development
  • Yellow - Travis CI build
  • DockerHub build - red

Main use cases:

Where builtImagesNo source changeSources changedWWW sources changed

NPM packages changed

PIP Packages changed

CI Apt deps changedApt deps changed

First time build

(from scratch)

Commwnts

Local Machine *

with 'Airflow Breeze'

CI

0:10


0:15

0:40

1:364:207:138:07

7:50 (warning)

Timing for typical local development. Note that build from scratch takes less than rebuild (we are pulling images first)

Travis CI build


CI3:243:323:303:475:457:39

8:24

8:26Typical timing for CI builds. Those are delays/additional time expected in PRs that introduce the type of changes described in the table.

DockerHub


Airflow

CI

8:208:4011:0113:4033:3038:4544:0010:00

There are significant delays/queues on DockerHub. The image sometimes waits in a queue several hours before it actually starts building. Both CI and Airflow CI images are built.

Note that build from scratch takes less than rebuild (we are pulling images first)

* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.


Other build types:

Where builtImagesNo source changeSources changedWWW sources changed

NPM packages changed

PIP Packages changed

CI Apt deps changedApt deps changed

First time build

(from scratch)

Commwnts

Cloud Build **


CI2:533:003:073:314:406:448:339:35For testing - Google Cloud Build was also tested. It's not an official way of running CI but it might become an optional System for Google Cloud Platform

Local Machine *

Docker build CI Airflow image

CI0:020:150:250:444:076:227:43

10:20


Build locally Airflow CI image using 'docker build'. It's not happening usually but can be done to manually build the CI image:

docker build --build-arg APT_DEPS_IMAGE=airflow-ci-apt-deps .

Local Machine *

Docker Build Airflow image

Airflow0:090:200:290:564:283:308:09

10:18

Build locally Airfow image using 'docker build'. This is not usually done, but using 'docker build .' might be done at any time by developers as they are quite used to building images this way.

docker build .

Google Compute Engine ***


CI

1:131:231:432:2610:2012:3013:0916:35

Cloud machine was used to simulate what happens in CI environment.

CI=true - runs /hooks/build

Google Compute Engine ***

with 'Airflow Breeze'

CI

0:15

0:25

0:40

1:403:145:308:40

7:22 (warning)

Same as for local development but using the cloud machine. Note that build from scratch takes less than rebuild (we are pulling images first)

Only CI build using Airflow Breeze


* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.

** Cloud Build - M8 High CPU

*** Google Compute Engine: custom (8 vCPUs, 31 GB memory)



6 Comments

  1. I was surprised that the mono-layer one is bigger than the multilayer one. I build it locally and it is a bit smaller, but not much:

    MacBook-Pro-van-Fokko:incubator-airflow fokkodriesprong$ docker images airflow-master-no-install-recommends
    REPOSITORY                             TAG                 IMAGE ID            CREATED              SIZE
    airflow-master-no-install-recommends   latest              8430ac490481        About a minute ago   971MB
    MacBook-Pro-van-Fokko:incubator-airflow fokkodriesprong$ docker images airflow-master
    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    airflow-master      latest              919b664b3984        12 minutes ago      977MB

    The Dockerfile in the repo does not use --no-install-recommends, but this is only a 6 megs.

  2. I was also a bit surprised the layered image is smaller (smile). I expected slightly bigger one to be honest.

    For me the numbers look like that:

    docker images potiuk/airflow-monodocker
    REPOSITORY TAG IMAGE ID CREATED SIZE
    potiuk/airflow-monodocker latest 711f22148f14 14 hours ago 1.14GB

    [potiuk:~/code/airflow-breeze/workspaces/polidea/airflow/airflow/contrib/hooks] [incubator-airflow3.6] update-documenation-optional-project-id+ 1d1h52m58s ± docker images potiuk/airflow-layereddocker
    REPOSITORY TAG IMAGE ID CREATED SIZE
    potiuk/airflow-layereddocker latest 964458a837c4 13 hours ago 1.12GB


    Hard to say where the differences come from. One thing to say - I was using the latest released docker (I upgraded yesterday):

    Docker version 18.09.1, build 4c52b90

    I double checked and I think also where the difference might come from is that when I build it I also had node_modules in one of the folders (I have a cloud function for slack notifications and I have locally checked out (in sources) a number of the node modules - I think they are added to the sources layer of Docker.  I will rebuild without the modules and see what I get.

  3. It's also likely because I have .dockerignore that ignores certain build artifacts for multi-layered docker, and it is missing for the mono one. I will rebuild both with the same .dockerignore and remove all the artifacts I had and will update the numbers. But I don't expect much difference (smile)

  4. Ok. After cleanup and rebuild from scratch i got:

    potiuk/airflow-layereddocker latest 055d0daee787 45 minutes ago 1.01GB

    potiuk/airflow-monodocker latest 725143eaf153 4 minutes ago 976MB

    So as expected slightly smalller monodocker image (24 MB => 2% difference). I will update the numbers. 

  5. I updated the numbers and conclusions Fokko Driesprong