...
We do support more than one DAG definition per python file, but it is not recommended as we would like better isolation between DAGs from a fault and deployment perspective and multiple DAGs in the same file goes against that. For now, make sure that the dag object is in the global namespace : you can use the globals dict as in
globals()[dag_id] = DAG(...)
Configuring parallelism in airflow.cfg
- parallelism = number of physical python processes the scheduler can run
- dag_concurrency = the number of TIs to be allowed to run PER-dag at once
- max_active_runs_per_dag = number of dag runs (per-DAG) to allow running at once
Understanding the
execution date
- Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for
2016-02-19
, I would do it at2016-02-20 midnight GMT
, which would be right after all data for2016-02-19
becomes available. - This date is available to you in both Jinja and a Python callable's context in many forms as documented here. As a note
ds
refers todate_string
, notdate start
as may be confusing to some.
- Airflow was developed as a solution for ETL needs. In the ETL world, you typically summarize data. So, if I want to summarize data for
Run your entire Airflow infrastructure in UTC. Airflow was developed at Airbnb, where every system runs on UTC (GMT). As a result, various parts of Airflow assume that the system (and database) timezone is UTC (GMT). This includes:
- Webserver
- Metadata DB
- Scheduler
- Workers (possibly)
When setting a schedule, align the start date with the schedule. If a schedule is to run at 2am UTC, the start-date should also be at 2am UTC
Bash Operator - Jinja templating and the bash commands
- Described here : see below. You need to add a space after the script name in cases where you are directly calling a bash scripts in the
bash_command
attribute ofBashOperator
- this is because the Airflow tries to apply a Jinja template to it, which will fail.
- Described here : see below. You need to add a space after the script name in cases where you are directly calling a bash scripts in the
...