Status

StateDraft
Discussion Thread
JIRA
Created

2018-08-04

Motivation

Launching of tasks is quite expensive. We launch 2 airflow managers (LocalTaskJob and — raw task) and 6 processes (<user shell> → "bash -c" → "python"). We can just rely on executing "airflow" directly as the unix kernel's program loader is responsible for doing this. When exec() is called, it asks the kernel to load the program from the file at its argument. It will then check the first 16 bits of the file to see what executable format it has. If it finds that these bits are #! it will use the rest of the first line of the file to find which program it should launch, and it provides the name of the file it was trying to launch (the script) as the last argument to the interpreter program

Design

LocalTaskJob and the worker can merged into one. This means refactoring LocalTaskJob out of jobs.py and integrating its functionality into the executors. Next to that we should disable the use of "shell=True" as this launches an extra user shell which is not required. Finally, we should use "airflow" directly.

Tasks

  • Use airflow directly

AIRFLOW-3964

1 Comment

  1. This work was done as part of 2.0, without needing an AIP.