As cloud goes Kubernetes native, Docker (or more precisely containers) becomes the default mechanism for packaging and running applications. We are currently using Docker images for Continuous Integration (AIP-10 Multi-layered and multi-stage official Airflow CI image) and for local development environment (AIP-7 Simplified development workflow). There are several images that are not maintained directly by the Airflow Community but are used by users to run Airflow via Docker image.
The images often used are:
- Puckel image: https://github.com/puckel/docker-airflow/blob/master/Dockerfile
- Astronomer image: https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.5/Dockerfile
The chart (and corresponding puckel image) is quite ok for the past but if we want to move forward, we need to make sure that the image, charts etc. are driven and managed by the community following release schedule and processes of Apache Software Foundation.
The current helm chart uses the Puckel image which was good for quite a while but it was not really part of the Apache official community effort. For example one of the rules of releasing software is that any software formally released by the project must be voted by PMC (https://www.apache.org/foundation/how-it-works.html#pmc-members)
By bringing the official image to apache/airflow repository and making sure it is part of the release process of Airflow we can release new images at the same time new versions of Airflow get released. Additionally we can provide more maintainability - for example add some more detailed instructions and guidelines on how to run Airflow in the production environment. We can also make sure we have some optimisations in place and support wider set of audience - hopefully we can get some feedback from people using the official Airflow image/chart and address it longer term. Once we incorporate it to our community process, it will be easier for everyone to contribute to it - in the same way they contribute to the code of Airflow.
What change do you propose to make?
The proposal is to update the current CI-optimised Docker images of Airflow to build production-ready images. This image should retain properties of the current image but should be production-optimised (size, simplicity, execution speed) rather than CI-optimised (speed of incremental rebuilds). The properties to maintain:
1) It should be build after every master merge (so that we know if it breaks quickly)
2) It should contain:
- libraries needed to run Apache Airflow
- client libraries required to connect to external services (databases, etc.)
- Apache Airflow itself with all production-needed extras
3) It should be available in all the Python flavours that Apache Airflow supports
4) It should be incrementally rebuilt whenever dependencies change.
5) Whenever new version of Python base image is released with security patches, the master image should be rebuilt using it automatically.
6) Whenever new versions of Python base image is released, the released images should be re-built using the latest security patches.
7) Running `docker build .` in The Airflow's main directory should produce production-ready image
8) The image should be published at https://cloud.docker.com/u/apache/repository/docker/apache/airflow
9) It uses the same build mechanisms as described in AIP-10
10) The naming convention proposed (following AIP-10 - python 3.6 set as default image).
Master-build images: airflow:master-python3.5, airflow:master-python3.6, airflow:master-python3.7, airflow:master==airflow:master-python3.6
Release images: airflow:1.10.6-python3.5, airflow:1.10.6-python3.6, airflow:1.10-python3.6, airflow:latest==airflow:1.10.6-python3.6
11) No NPM in the final image (just the compiled assets)
12) The official helm chart for the Apache Airflow should use the official Docker production-ready images.
13) The official image is used in the places that are prominent way of distributing the image (https://hub.helm.sh/charts?q=airflow, possibly Bitnami etc.).
Draft PR with POC of production image is available here
What problem does it solve?
- Lack of officially supported production-ready image of Airflow
- Possibility of running Airflow in Kubernetes using helm chart immediately after releasing Airflow officially
- Possibility of running Airflow using docker-compose immediately after releasing Airflow officially
Why is it needed?
Users need to have a way to run Airflow via Docker in production environments - this should be part of the release process of Airflow.
Are there any downsides to this change?
We will have to make sure as community to document the usage of Airflow image and to maintain it for the future.
Which users are affected by the change?
All users that are using Airflow using Dockerised environments.
How are users affected by the change? (e.g. DB upgrade required?)
New image will need to be used.
What defines this AIP as "done"?
1) Image is regularly built and published at https://cloud.docker.com/u/apache/repository/docker/apache/airflow
2) Release process is updated to release the images as well as pip packages
3) Documentation on using the image is published
4) We have an official helm chart to install Airflow using this image.
5) The image follows guidelines of https://github.com/docker-library/official-images and is present in the official images list.
6) We know the process of updating security-patches of base python images for Airflow and follow it.
7) The Official Helm Chart uses the image
8) Helm Hub https://hub.helm.sh/charts?q=airflow uses the image