Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Page properties


StateHas sub-AIPS
Discussion Thread
Created

Created


It's quite difficult to define "Done" criteria for "improve security", however ideas from this AIP and later discussions, have been turned into smaller AIPs:

They will possibly be followed by more. We will consider this AIP-1 as an "Umbrella" one for those AIPs and we will consider it done when we feel that the security is improved "enough"

Meetings

Meeting #2: Nov 26, 2021

The meeting notes here.

The meeting recording available here.

Meeting #3 Jan 19, 2022

The meeting notes here.

The meeting recording available here

Meeting #4 Feb 16, 2022

The meeting notes here.

The meeting recording available here

Chat transcript here.

Motivation

Airflow runs arbitrary code across different workers. Currently, every task has full access to the Airflow database including connection details like usernames, passwords etc. This makes it quite hard to deploy Airflow in environments that are multi-tenant or semi-multi-tenant. Next to that there is no mechanism in place that ensures that what the scheduler thinks it is scheduling is also the thing that is running at the worker. This creates to additional operational risk of running an out of date task that does something else than expected, aside from the security risk of a malicious task.

Challenges

DAGs can be generated by DAG factories essentially creating a kind of sub-DSL in which DAGs are defined. From Airflow’s point of view this creates a challenge as DAGs therefore are able to pull in arbitrary dependencies. 

Envisioned workflow

DAG submission

The DAG directory should be fully managed by Airflow. DAGs should be submitted by authorized users and ownership should be determined by the submitting user. Group ownership determines whether others users can overwrite the DAG. DAGs can be submitted either as single .py file or as a zip containing all dependencies that are required outside of system dependencies. 

DAGs are submitted to a REST endpoint. The API server then verifies the submitted DAG and places it in the DAG directory (/var/lib/airflow/dags). It also calculates the SHA512 for the DAG definition and its tasks. It does this on every update.

DAG distribution

In case of distributed workers DAG distribution happens by REST API. The worker checks if a DAG is available In the local cache (/var/lib/cache/airflow/dags) and verifies the SHA512 hash. In case the DAG is unavailable or the hash is different the worker asks the API for the version with the specified hash. It then stores the DAG in the local cache.

Task Launching

Scheduler

When having determined a particular task is ready for launch the scheduler checks what connections are required for the task. It verifies the user that owns the task against the connections. If these match the scheduler serializes the connection details into a metadata blob together. It also recalculates the SHA512 for the task and verifies this with the SHA512 that was calculated at submission time. If these don’t match a security exception will be raised. The SHA512 of the task is combined together with the connection information into metadata. The metadata is serialized and then is sent to the executor together with the task_instance information.

Worker

The worker receives the metadata and calculates the hash of the task and verifies this against the metadata it has received. If these match it continues otherwise it will raise a security exception. 

The worker pipes (os.pipe() ) the metadata over a unique file descriptor to the task. 

Task

The task parses the metadata and obtains it connection information from the connection information provided in the metadata or for backwards compatibility from the environment.