Status
Motivation
Current official Airflow image is rebuilt from the scratch every time new commit is done to the repo. It is a "mono-layered" one and does not use Docker's multi-layer architecture nor multi-stage Docker architecture.
Mono-layered image means that builds after only small changes take as long as full build rather than utilise caching and only rebuild what's needed.
With multi-layered approach and caching enabled in Docker Hub we can optimise it to download only the layers that changed. This enables the users using the images to download only incremental changes, and opens up a number of options how such incremental build/download process can be utilised:
- Multi-layered images can be used as based for AIP-7 Simplified development workflow - where locally downloaded images are used during development and they are incrementally updated quickly during development with newly added dependencies.
- Multi-layered images being part of the "airflow" project can be used to run Travis CI integration tests (simplifying the idea described in Optimizing Docker Image Workflow ). Having incremental builds will allow DockerHub registry to be used as source for base images (pulled before build) to build locally final image used for test execution in an incremental way.
- Why initially the images are not meant to be used in production, using multi-staging, variable arguments and multiple layers to produce production-ready Airflow image that can be used to pre-bake Dags into the image - thus making Airflow closer to be Kubernetes-native. This has been discussed as potential future improvement in AIP-12 Persist DAG into DB
- Ideally both Airflow and CI images should be maintained in single place - single "source of truth" to ease maintenance and development. Currently they are maintained in separate repositories and have potentially different dependencies and build process. It also makes it difficult to add your own dependencies during development as there is no regular/development friendly process to update CI image with new dependencies.
- Currently the CI builds are reinstalling PIP dependencies every time they are run. Since Airflow does not use pinned dependencies, sometimes transitive dependencies change and cause the CI builds to fail for no apparent reason. The idea is to make sure that the same set of dependencies are used for CI builds until setup.py changes and dependencies got reinstalled.
- At the same time we should be able to verify that the "clean" build still works - there should be a way to find out that clean install still produces good results.
What problem does it solve?
It solves the following problems with current Dockerfile and image:
- time needed to rebuild the image when anything changes is rather big - no matter what kind of change it is, the current mono-layered repo always takes about 10 minutes on a typical development machine to rebuild
- this long rebuild time is a blocker to a number of use cases (including building the image during CI, building image for Kubernetes in the future, building the image for local development environment, building the image locally for reproducing CI test environment]
- it's unclear what's the purpose of the current Dockerfile and docker image. It's neither used for CI, nor ready to be used for production use. It's also pretty unusable for regular pulls because effectively each such image is fully rebuilt from scratch and downloaded in full
- the current Dockerfile/image does not seem to be used by anyone - it was failing in DockerHub for 2 weeks March 1 - March 14 due to a bug and nobody noticed. The proposed image is build incrementally in CI and failures will be noticed.
- stability of CI builds → currently sometimes transitive dependency changes cause CI tests to fail. Unless setup.py changes, the tests in CI should be done using stable set of dependencies (frozen at the moment of the last build).
- Only when setup.py changes, the dependencies should be reinstalled and transitive dependencies got updated (but then it should happen in a branch rather than master to detect it before other branches start to fail)
Why is this change needed?
- Enabler for AIP-7 Simplified development workflow
- Improved and simplified CI process
- CI process should be able to be easily reproduced locally (including failing tests)
- We can use the image during running system tests for external systems as described in AIP-4 Automation of System Tests [Deps: AIP-47].
- TOX adds unnecessary complexity and should be removed
- CI process is not always reproducible with transitive dependencies causing installation problems
Suggested implementation
General architecture
In the PR : https://github.com/apache/airflow/pull/4543 the current mono-layered docker has been rewritten as multi-layered one.
The PR uses "hooks/build" hook that is used by DockerHub build process to control caching and build process.
Thanks to that we can build different variants of the images (Airflow image (slim), CI image (fat) with more dependencies.
Life of an image
Assumptions
- There are two images to be built:
- "Airflow" image - slim image with only necessary Airflow dependencies
- "CI" image - fat image with additional dependencies necessary for CI tests
- there are separate images for each python version (currently 2.7, 3.5, 3.6)
- each image uses python-x.y-slim as a base
- all stages are defined in single multi-stage Dockerfile
- Standard Docker build: it's possible to build main airflow image by issuing "docker build ." command. It's not optimised for DockerHub cache reuse but it will build locally.
- Scripted Docker build: we are using hook/build script to build the image utilising DockerHub cache - pulling the images from registry and using them as cache. Those are mainly useful for local development
- binary/apt dependencies are build as separate stages - so that we can use whole cached images with main/CI dependencies as cache source
- the builds are versioned - airflow 2.0.0.dev0 images are different than airflow 2.0.1dev0
- we should be able to run the build without Docker cache - reinstalling everything from the scratch
Terms
- Multi-staging Dockerfile - a Dockerfile that utilises multi-stage builds - allowing to have intermediate images that are used to build final image. Useful for code reuse between different variants of the same image
- Base python image - this is the base image that Airflow image starts from. We are using Python-x.y-slim image as base image.
- Airflow apt dependencies - binary dependencies installed with 'apt-get' package manager (Debian based systems). Minimal dependencies that are needed to run Airflow
- CI Airflow apt dependencies - binary dependencies installed with 'apt-get' package manager (Debian based systems). Dependencies that are needed by Airflow to run CI test suite (communicating with external images such as Mysql/Postgres/Hadoop/....)
- Docker compose configuration - configuration of interacting images needed to run CI tests by Airflow - describing dependencies and versions of the images needed. Docker-compose orchestrates starting the whole environment using all the images.'
- PIP dependencies - Python dependencies required by Airflow to start. They are described in setup.py configuration file (and related files such as version.py). There are different variants of dependencies for different purposes that you can specifiy by requesting "extras" (such as "ci_all", "devel_all" etc.)
Wheel cache - optional, pre-compiled wheels for PIP packages. Such pre-compiled wheel packages can be used to trade-off the need to download, compile and install packages into possibility if installing them from locally stored wheels. It is supposed to speed up installation especially when the whole PIP install layer gets invalidated and we need to reinstall all PIP packages from the scratch. Wheel cache should not be rebuild during CI Docker build to save time - rather than than cache built in previous build should be used. For now we abandoned the idea of using wheel cache as it seems to increase complexity and brings marginal performance improvements.- NPM dependencies - Javascript/NPM dependencies required by Airflow's webserver. They are installed locally and inside container image (using 'npm ci/npm install' methods). They are described in package.json and package-lock.json (in airflow/www/ directory). The package.json describes general requirements that are used by 'npm install' and package-lock.json describes "locked" dependencies that are used by 'npm ci'. Installed modules are stored in "node_modules" dir in www directory.
- "Pre-processed" web resources - Javascript sources of the Webserver are processed by 'npm run prod' and prepared for "production" deployment - including minifying of the sources etc. The results are stored in "airflow/www/static/dist" directory.
- www sources - placed in airflow/www - those are sources of the webserver that require pre-processing as described above.
- Airflow sources - airflow sources are mainly python files, therefore they do not need to be compiled. Usually in local environment they are mounted directly to where they are supposed to be in the image, so usually you do not need to rebuild the image to get the latest sources.
- Airflow Image - slim image with only apt dependencies and "all" PIP extras
- CI Airflow Image - fat image with apt + CI apt dependencies and "devel_ci" PIP extras
- Airflow Breeze - proposed, simplified development workflow environment that makes it easy to run Airflow locally for testing, simulates CI environment for testing and allows to manage lifecycle of the images of Airflow. It is proposed as part of the AIP-7 Simplified development workflow
Changes that trigger rebuilds
Those changes below are described starting from the most frequent ones - so staring backwards from the end of Dockerfile, going up to the beginning.
- Airflow apt dependencies are "upgraded" as last part of the build (after sources are added) - thus upgrade to latest versions available is triggered every time sources change (utilising cache from previous installations).
- Airflow source changes do not invalidate previously installed packages from apt/pip/npm. They only trigger upgrades to apt packages as explained above.
- changing to www sources trigger pre-processing of the web page for production (npm run prod) and everything above.
- changing NPM Dependencies (package.json or package-lock.json) trigger reinstallation of all npm packages (npm ci) and everything above.
- changing any of PIP dependencies (setup.py-related files) trigger reinstallation of all pip packages (pip install -e .[<EXTRAS>]) and everything above. Optionally wheel cache might be used (see below)
changing the wheel cache (from previous build) causes rebuild of everything above- for CI build, changing CI Airflow apt dependencies triggers reinstallation of those dependencies and everything above
- changing Airflow apt dependencies triggers reinstallation of those dependencies and everything above
- there is a possibility to trigger whole build process by changing one line in Dockerfile (FORCE_REINSTALL_ALL_DEPENDENCIES)
- changing Dockerfile itself triggers rebuild of the image (at the line it was changed).
- new python stable image (python-x.y-slim) triggers rebuild of the whole image. The latest stable image is always pulled when building the image in CI environment.
Observations during implementation of POC
During implementation of POC for the multi-stage/multi-layered build, several different iterations have been performed. During the trials also POC for related AIP-7 Simplified development workflow was implemented because it had an impact on timing of builds especially for the local development workflow, where images are built as needed locally. The following observations have been made (see below for detailed discussion of variants considered).
- Both "mono-layered" and "multi-layered" build time can be significantly decreased by implementing CASSANDRA fix. Cassandra driver installed during PIP install will take around 5 minutes to install on its own because it compiles an optimal cython version of the driver. This (for non-production purposes) can be sped up significantly (to a few seconds) by setting CASS_DRIVER_BUILD_CONCURRENCY and CASS_DRIVER_NO_CYTHON variables. Such image will not be good for production but it should be more than enough for CI testing.
- Multi-layered architecture is rather helpful in bringing down the time added extra for CI builds. It decreases time needed to build image to 3 minutes in most cases, with more than 5 minutes always needed for Mono-layered image (TBC - will probably go further down with Wheel cache removal)
- Docker local builds and checking cache validation by Docker takes significant time. Much more efficient techniques is that tested in Airflow Breeze where triggering Docker build is done only after checking whether important files changed (using md5sum). It's effective and efficient (less than 2 seconds to perform the check and enter environment in case the files are not changed)
- Wheel cache seems to cause more problems than it solves.. It works well for the case where new requirements are added (decreasing time and bandwidth needed to install packages) but building the cache takes time, it also adds significant complexity to the whole build. We are trading time needed to download the cache with time needed to use wheel cache layers and size of the disk needed to store the wheels. Overall the savings do not seems to justify increased complexity and the proposal is to not use wheel cache in the official image.
Implementation proposal
Two variants of the implementation are proposed and described in detail below:
- No wheel cache variant
Wheel cache variant
Those are characteristics of the image:
- Multi-staging Docker image
- Can be build with 'docker build .'
- Arguments controlling the image generated can be passed with --build-arg parameters
- Depending on whether Airflow or Airflow CI image is used, either airflow-apt-deps, or airflow-ci-apt-deps intermediate image is used to build the final one
- Until we have BuildKit support in Docker we cannot build airflow-ci-apt-deps layer conditionally. Therefore there is a workaround to build the image quickly for Airflow image - with "conditional" run
Image size comparison
Size | Extras | |
---|---|---|
Airflow mono-layer image | 1.2 GB | all |
Airflow multi-layer | 1.2 GB | all |
CI Airflow multi-layer | 2.5 GB | devel_ci |
Proposed implementation sequence/ schedule
The proposed sequence/schedule of the implementation is as follows (as proposed in https://github.com/apache/airflow/pull/4543#issuecomment-474049231):
Step 1: AIRFLOW-4115 JIRA, PR - Docker file for Main airflow image is multi-staging and has multiple layers
After merging it, the Multi-stage Dockerfile replaces the original mono-layered Dockerfile. It only implements main Airflow image and continues to build single variant (python 3.6). No functional changes.
After merging it, the Multi-stage Dockerfile implements both main and CI images are build in DockerHub via custom build script. Tests continue to be executed using the old incubator-ci image.
After merging it, Travis CI uses the multi-stage image to run tests.
In the follow-up step AIP-7 Simplified development workflow can be merged to improve development and allow to reproduce Travis CI testing locally.
Proposed setup of the DockerHub and Travis CI
DockerHub - DockerHub builds for the WIP
- We should choose which versions of Python should be supported. For now assumption is that Python versions: 3.5, 3.6 should be supported
- The DockerHub setup should be configured to build one build for each python version. Each builds produces two images:
- <BRANCH>-pythonX.Y-ci-slim - Airflow slim CI image
- <BRANCH>-pythonX.Y-ci - Airflow full CI image
- The DockerHub builds should only be build from 'master' and 'v1-10-test' branches. They will produce an incrementally updated image with every commit merged to `master` and push to `v1-10-test` branch.
- The builds in DockerHub can take a long time, this means that the images will not be available for some time after master is merged, however this is not a problem as Docker will automatically invalidate and build images as needed even from older images.
- The build scripts are implemented in the way that python version is determined from the Docker Tag name (master-python3.5-ci, master-python3.6-ci)
- The images TAGs generated follows the scheme:
Versions from master (development use only):
- CI slim image : airflow:master-python3.5-ci-slim, airflow:master-python3.6-ci-slim, airflow:master-ci-slim==airflow:master-python3.6-ci-slim
- CI full image : airflow:master-python3.5-ci, airflow:master-python3.6-ci, airflow:master-ci==airflow:master-python3.6-ci
- Production optimised images: (future - not yet available): airflow:master-python3.5, airflow:master-python3.6, airflow:master==airflow:master-python3.6
Release versions:
- CI slim image: airflow:1.10.4-python3.5-ci-slim, airflow:1.10.4-python3.6-ci-slim, airflow:latest-ci-slim==airflow:1.10.4-python3.6-ci-slim
- CI full image: airflow:1.10.4-python3.5-ci, airflow:1.10.4-python3.6-ci, airflow:latest-ci==airflow:1.10.4-python3.6-ci
- Production optimised images (future - not yet available): airflow:1.10.4-python3.5, airflow:1.10.4-python3.6, airflow:latest==airflow:1.10.4-python3.6
- custom "hooks/build" script is implemented to control details of the build, caching and producing more than one image (Airflow + Airflow CI) with single Docker Tag auto-build configuration.
Example DockerHub builds from the current WP can be found at https://cloud.docker.com/repository/docker/potiuk/airflow/
Travis CI:
- The setup of Travis CI remains as it is today for CI/incremental builds. It makes every PR starts a build for its own branch.
- The CI builds (controlled with CI environment variable) pull images from DockerHub - they pull the latest (master) images for caching and perform local build of images based on current sources
- Those locally built images are used to run tests (same as current builds - using Docker Compose setup)
- Thanks to TOX removal and simplification of the variables it is now clearer to see what kind of build each job performs (the list of jobs below is generated from the configuration above)
- Additional cron-controlled TravisCI build will be triggered daily to perform clean image build + tests - without any Docker cache. This can be used to verify if the transitive dependencies changes are not breaking the current build.
Stages of the image
Those are the stages of the image that are defined in Dockerfile
- X.Y - python version (.5 or 3.6 currently)
- VERSION - airflow version (v2.0.0.dev0)
Stage | Description | Airflow build dependencies | Airflow CI build dependencies | |
---|---|---|---|---|
1 | python-base-image | Base python image | base | base |
2 | ariflow-apt-deps | Vital Airflow apt dependencies | 1 | 1 |
3 | airflow-ci-apt-deps | Additional CI image dependencies | not used | 2 |
4 | main | Slim airflow sources build. Used for both Airflow and CI build | 2 | 3 |
Dependencies between stages
Effectively those images we create have those dependencies. In case of Dockerfile changes, Docker multi-staging mechanism takes care about rebuilding only those stages that need to be rebuild in case of Dockerfile definition change - changes in a stage trigger rebuilds only in stages that depend on it.
Layers in the main image
The main image has a number of layers, that make the image rebuilds incrementally depending on changes in the repository vs. the previous build. Mechanism of Docker build (context/cache invalidation) are used to determine if the subsequent layers should be invalidated and rebuild. The ^^ means that previous layer change triggers the rebuild.
No. | Layer | Description | Trigger for rebuild | Airflow build behaviour | CI build behaviour |
---|---|---|---|---|---|
1 | PIP configuration | Setup.py and related files (version.py etc.) | Updated dependencies for PIP | Copy setup.py related files to context | Copy setup.py related files to context |
2 | PIP install | PIP installation | ^^ | All PIP dependencies downloaded and installed | PIP dependencies downloaded and installed |
3 | NPM package configuration | package.json and package-lock.son | Updated dependencies for NPM | Copy package files to context | Copy package files to context |
4 | npm ci | Installs locked dependencies from NPM | ^^ | All NPM dependencies downloaded and installed | All NPM dependencies downloaded and installed |
5 | www files | airflow/www all files | Updated any of the www files | Copy www files to context | Copy www files to context |
6 | npm run prod | Prepares production javascript packaging for webserver | ^^ | Javascript prepared | Packages prepared |
7 | airflow sources | Copy all sources to context | Any change in sources | Copy sources to context | Copy sources to context |
8 | apt-get upgrade | Upgrading apt dependencies | ^^ | All apt packages upgraded to latest stable versions | All apt packages upgraded to latest stable versions |
9 | pip install | Reinstalling PIP dependencies | ^^ | Pip packages are potentially upgraded | All PIP packages are upgraded |
The results of such layer structure are the following behaviours:
- in case PIP configuration is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled. For Airflow build, all PIP packages are downloaded and installed
- in case NPM configuration is changed: NPM packages + NPM compile + sources are reinstalled
- in case any of WWW files changed: NPM compile + sources are reinstalled
- in case of any source change: sources are reinstalled
Different types of builds
The images for Airflow are build for several scenarios - and the "hook/build" script with accompanying environment variable controls which images are built during those scenarios:
Scenario | Trigger | Purpose | Cache | Frequency | Pull from DockerHub | Push to DockerHub | Images prepared during the build | |
---|---|---|---|---|---|---|---|---|
Airflow | CI | |||||||
DockerHub build for master branch | A commit merged to "master" | Build and push reference images that are used as cache for subsequent builds | From master | Several times per day | Yes | Yes | Yes | Yes |
Local developer build | Triggered by the user | Build when developer adds dependencies or downloads new code and prepares development environment | From local images (pulled initially) unless cache is disabled | Once per day | First time or when requested | When requested and user logged in | Yes | |
Google Compute Engine Build Machine | Manual build | First Manual build to populate DockerHub registry faster (optional) | No cache | First build | No | Yes | Yes | Yes |
CI build | A commit is pushed to any branch | Builds image that is used to execute CI tests for commits pushed by developers. | From master | Several times an hour | Yes | No | Yes |
Build timings
Mono-layered image build timings
Build timings for different scenarios
Those timings were measured during tests. Times are in HH:MM:SS.
The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times.
Where built | Images | No source change | Sources changed | WWW sources changed | NPM packages changed | PIP Packages changed | CI Apt deps changed | Apt deps changed | Full build (from scratch) | Comments |
---|---|---|---|---|---|---|---|---|---|---|
Local Machine * Original Mono-layer image | Airflow | 0:27 | 8:26 | 8:26 | 8:26 | 8:26 | 8:26 | 8:26 | 9:26 | |
Local Machine * Monolayer (Cassandra fix) ** | Airflow | 0:30 | 4:34 | 4:34 | 4:34 | 4:34 | 4:34 | 4:34 | 5:56 | Seems that Cassandra driver speedup is a single biggest improvement we can make to Mono-layer image. However as we can see from here - it takes the same time to rebuild the image (around 5 minutes) no matter which part of the sources changed. |
Google Compute Engine *** Mono-layer | Airflow | 0:01 | 9:07 | 9:07 | 9:07 | 9:07 | 9:07 | 9:07 | 10:43 |
* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.
** Cassandra fix - installing cassandra driver takes a lot of time - it compiles cython-based driver (which is good for performance) - Cassandra fix speeds up the build by removing cython optimisations. Multi-layer images are build with cassandra fix.
*** Google Compute Engine: custom (8 vCPUs, 31 GB memory)
No wheel cache variant build timings
Those timings were measured during tests. Times are in HH:MM:SS.
The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times. The coloured fields show use cases that are "typical" during normal development cycle.
- Green - local development
- Yellow - Travis CI build
- DockerHub build - red
Main use cases:
Where built | Images | No source change | Sources changed | WWW sources changed | NPM packages changed | PIP Packages changed | CI Apt deps changed | Apt deps changed | First time build (from scratch) | Commwnts |
---|---|---|---|---|---|---|---|---|---|---|
Local Machine * with 'Airflow Breeze' | CI | 0:15 | 0:25 | 0:40 | 1:02 | 4:14 | 9:14 | 9:27 | 8:54 | Timing for typical local development. Note that build from scratch takes less than rebuild (we are pulling images first) |
Travis CI build | CI | 2:37 | 2:37 | 2:35 | 3:11 | 6:00 | 8:50 | 8:50 | 9:20 | Typical timing for CI builds. Those are delays/additional time expected in PRs that introduce the type of changes described in the table. Note that in the current build on Travis CI it takes about 5 minutes to perform initial setup - with installing and collecting required packages. |
DockerHub | Airflow CI | 7:03 | 7:20 | 10:00 | 13:20 | 29:00 | 41:50 | 39:24 | 10:15 | There are significant delays/queues on DockerHub. The image sometimes waits in a queue several hours before it actually starts building. Both CI and Airflow CI images are built. Note that build from scratch takes less than rebuild (we are pulling images first) |
* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.
Other build types:
Where built | Images | No source change | Sources changed | WWW sources changed | NPM packages changed | PIP Packages changed | CI Apt deps changed | Apt deps changed | First time build (from scratch) | Commwnts |
---|---|---|---|---|---|---|---|---|---|---|
Cloud Build ** | CI | 1:10 | 1:10 | 1:10 | 1:30 | 4:30 | 7:18 | 7:28 | 7:20 | For testing - Google Cloud Build was also tested. It's not an official way of running CI but it might become an optional System for Google Cloud Platform |
Local Machine * Docker build CI Airflow image | CI | 0:02 | 0:13 | 0:23 | 0:48 | 4:28 | 8:12 | 8:06 | 8:00 | Build locally Airflow CI image using 'docker build'. It's not happening usually but can be done to manually build the CI image: docker build --build-arg APT_DEPS_IMAGE=airflow-ci-apt-deps . |
Local Machine * Docker Build Airflow image | Airflow | 0:02 | 0:14 | 0:26 | 0:46 | 4:37 | 5:15 | 6:10 | 6:58 | Build locally Airfow image using 'docker build'. This is not usually done, but using 'docker build .' might be done at any time by developers as they are quite used to building images this way. docker build . |
Local Machine* | CI | 1:58 | 2:10 | 2:28 | 2:57 | 6:19 | 9:41 | 11:04 | 11:50 | Local machine was used to simulate what happens in CI environment. CI=true - runs /hooks/build |
Google Compute Engine *** | CI | 1:27 | 1:28 | 1:38 | 2:00 | 5:32 | 7:50 | 8:39 | 8:39 | Cloud machine was used to simulate what happens in CI environment. CI=true - runs /hooks/build |
Google Compute Engine *** with 'Airflow Breeze' | CI | 0:04 | 0:13 | 0:27 | 0:48 | 4:24 | 7:09 | 7:20 | 7:20 | Same as for local development but using the cloud machine. Note that build from scratch takes less than rebuild (we are pulling images first) Only CI build using Airflow Breeze |
* Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.
** Cloud Build - M8 High CPU
*** Google Compute Engine: custom (8 vCPUs, 31 GB memory)
6 Comments
Fokko Driesprong
I was surprised that the mono-layer one is bigger than the multilayer one. I build it locally and it is a bit smaller, but not much:
The Dockerfile in the repo does not use --no-install-recommends, but this is only a 6 megs.
Jarek Potiuk
I was also a bit surprised the layered image is smaller . I expected slightly bigger one to be honest.
For me the numbers look like that:
Hard to say where the differences come from. One thing to say - I was using the latest released docker (I upgraded yesterday):
Docker version 18.09.1, build 4c52b90
I double checked and I think also where the difference might come from is that when I build it I also had node_modules in one of the folders (I have a cloud function for slack notifications and I have locally checked out (in sources) a number of the node modules - I think they are added to the sources layer of Docker. I will rebuild without the modules and see what I get.
Jarek Potiuk
It's also likely because I have .dockerignore that ignores certain build artifacts for multi-layered docker, and it is missing for the mono one. I will rebuild both with the same .dockerignore and remove all the artifacts I had and will update the numbers. But I don't expect much difference
Jarek Potiuk
Ok. After cleanup and rebuild from scratch i got:
potiuk/airflow-layereddocker latest 055d0daee787 45 minutes ago 1.01GB
potiuk/airflow-monodocker latest 725143eaf153 4 minutes ago 976MB
So as expected slightly smalller monodocker image (24 MB => 2% difference). I will update the numbers.
Jarek Potiuk
I updated the numbers and conclusions Fokko Driesprong
Tao Feng
very great write up Jarek Potiuk