Status
Abstract
A new way to provide templates to an Airflow operator is proposed to replace the existing implicit template detection based on templates_ext
and template_fields
. A base class Template
is implemented to provide field-templating features, with two concrete subclasses FileTemplate
and StringTemplate
being planned for the initial release.
Motivation
Field templating in Airflow refers to how certain fields in Airflow operator classes can be passed in a string using the Jinja template syntax, and they are rendered at runtime to actual values used for the operator’s execution.
This is a nice feature, but the implicitness introduces a few inherent issues. Since the template detection and rendering happens automatically, it is not easy to tell what fields are template-able and what are not; while each operator does always have a template_fields
that canonically indicates this, the knowledge is generally “tribal” and not very discoverable. It is also not easy to debug things when the behaviour is not expected, especially when templates_ext
is involved—the TemplateNotFound
issue is a common pitfall that troubles newcomers to no end.
This is probably the single most awkward pattern in Airflow 2:
# XXX: Airflow 2 syntax.
cat_file = BashOperator(
task_id="cat_file",
bash_command="cat {{ ds }}.sh", # TemplateNotFound: cat {{ ds }}.sh
)
The string having a “.sh” suffix triggers templates_ext
logic, making Airflow interpret the string as a path (that points to a template) and try to load it. The traditional workaround is to add a trailing space to the command:
# XXX: Airflow 2 syntax.
cat_file = BashOperator(
task_id="cat_file",
bash_command="cat {{ ds }}.sh ", # Can you spot the difference?
)
This is so problematic we have a paragraph in documentation dedicated to it. Airflow 2.8 added the literal
helper to improve ergonomics on this a bit, but you still need to first learn about (or more likely, see your task fail with the weird error) to remember to import.
I want to make templating explicit in Airflow 3, trading some ease of writing for better DAG eligibility for less experienced Airflow users, improved debuggability when things go wrong, and providing better discoverability of whether a field is template-able.
Proposal
Instead of relying on template_fields
and templates_ext
to control templating behaviour, all operator fields in Airflow 3 will be non-templated by default. For a field to be template-able, it must have an argument in the operator’s __init__
matching the field name; the argument must explicitly accept a Template
instance. For example:
class BashOperator(BaseOperator):
def __init__(
task_id: str, # Not templatable.
bash_command: str | Template, # Templatable!
env: dict[str, str] | Template | None = None, # Templatable!
append_env: bool = False, # Not templatable.
...,
) -> None:
super().__init__(task_id=task_id, ...)
self.bash_command = bash_command
self.env = env
...
It may be possible to simplify this with either dataclasses or attrs. I’m not sure if it is possible, but we will look into this during implementation. We’ll use one of the libraries if so.
The template_fields
attribute will be made private and renamed to _template_fields
. This is due to technical limitations in getting type annotations in old Python versions—the goal is to eventually remove it and rely on inspect.get_annotations to inspect what fields support templating instead. A pre-commit hook will be created to ensure the information in _template_fields
is in sync in the meantime.
The templates_ext
attribute will be removed from BaseOperator, and user code relying on the attribute to be on the operator will break. See Compatibility and Migration below for migration suggestions.
Template sources
Two subclasses of Template will be introduced to Airflow 3.0:
StringTemplate
takes a string argument. The string should use the Jinja2 template syntax, and is rendered at runtime.FileTemplate
takes an argument that points to a file. The file’s content is loaded as the template string and rendered at runtime.
The Template
base class is considered a public API, and users are free to subclass from it. Custom subclasses are resolved like built-in ones, assuming the implementation is correct. Documentation on correctly implementing subclasses will be provided. Later Airflow versions may also introduce more subclasses.
With explicit template declaration, the pattern mentioned in the Motivation section becomes:
cat_file = BashOperator(
task_id="cat_file",
bash_command=StringTemplate("cat {{ ds }}.sh"),
)
Airflow knows exactly this is supposed to be a template string, not a path to the template (for which you’d use FileTemplate
instead), and does the correct thing.
This proposal does not aim to change how the templates are resolved and rendered at runtime. If a path passed to FileTemplate
is relative, for example, resolution is still controlled by the DAG’s template_searchpath
. Macros and filters also stay unchanged.
Rendering a non-string value
A new argument converter
will be added to Template
. If supplied, this should be a callable that takes exactly one argument—the rendered result—and converts it to a new value. The general use case is to render a field to a non-string type, such as:
StringTemplate("{{ ds }}", converter=datetime.date.fromisoformat)
The above is rendered into the date specified by ds
in the Python date
type.
Another interesting use case is to use ast.literal_eval
as the converter to create a list, dict, or other richer Python structures. This should be a nice alternative to the somewhat clunky and restrictive render_template_as_native_obj
. We can accept any callable here, including user-defined ones, without going through the plugin architecture, because the code will only be called in the worker.
Per-field arguments
Requiring all templating values to be wrapped in a class also introduces the possibility of per-field configurations. For example, you will be able to do
print_foo = BashOperator(
task_id="print_foo",
bash_command=StringTemplate(r"printf '$foo\n$bar'"),
env=FileTemplate(
"cat_file_env.json",
template_searchpath=["/etc/my_templates/"],
converter=json.loads,
),
)
Then, you can load the template file at /etc/my_templates/cat_file_env.json
with content
{"foo": "123", "bar": "456"}
Nesting
Another interesting way to use the classes is to have nesting templates, which was previously impossible:
run_some_file = BashOperator(
task_id="run_some_file",
bash_command=FileTemplate(StringTemplate("{{ ds }}.sh")),
)
This loads a file at runtime corresponding to ds
, and renders the content as a template. The example here is artificially contrived, but a similar pattern has been raised sporadically as a feature request. It is possible to work around the lack of this feature, but the syntax allows users more expressiveness when writing DAGs.
Compatibility and Migration
This is obviously a big breaking change and will require users to rewrite a lot of DAGs. The main way to approach this would be with a migration tool to automatically rewrite DAG. It would be furthermore improved if we are to have a transitional Airflow 2.11, where we can introduce the new syntax for people to test the rewritten DAGs’ runtime behaviour before actually upgrading to 3.0.
One thing I explicitly do not want to do is to keep a flag in Airflow 3 (either globally or otherwise) for users to switch back to the previous behaviour. This flag would require Airflow 3 to keep the template_fields
and templates_ext
attributes practically forever, which would be dead code if we expected people to use the new syntax. I am, however, not opposed to keeping them in a separate shim library outside of the apache-airflow
package for people to optionally download and use, something like (code is illustrative)
# airflow_compat/operators/bash.py
from airflow.operators.bash import BashOperator as FutureBashOperator
class BashOperator(FutureBashOperator):
template_fields = FutureBashOperator._template_fields
template_ext = (".sh", ".bash")
def __init__(self, **kwargs):
super().__init__(**kwargs)
for field in self.template_fields:
value = getattr(self, field)
if isinstance(value, str):
if value.endswith(self.template_ext):
setattr(self, field, FileTemplate(value))
else:
setattr(self, field, StringTemplate(value))
This way, we can avoid carrying compatibility code in Airflow 3, but still allow the user to slowly migrate to the new syntax after upgrading first (to receive other benefits).
Another possibility is to have a helper function for the user to call in their cluster policies, either in dag_policy
or task_policy
, that dynamically replaces values in templated fields on the operator with a Template
subclass. This may be the least intrusive solution—you virtually don’t need to rewrite any DAGs at all, but only add a function call in the policies—but personally I’m not sure I like the “magic” aspect of this, or want to recommend it as an official migration plan.
The literal
helper will stay in core Airflow. It will simply does nothing when used in operators implemented in the Airflow 3 style, although the compatibility layer should still use it the same way as in Airflow 2.
Scope
This AIP should be considered done when the proposed new types are implemented, a compatibility layer is introduced, and operators in core Airflow and official providers are migrated to use the compatibility layer so they work in both Airflow 2 (in both the old and new operator templating syntaxes) and Airflow 3 (in the new operator templating syntax).
Rejected Ideas
Using subclasses to specify the rendered result type
The possibilities of using subclasses to do this has been considered, e.g. having an IsoFormatTemplate
instead of passing in a separate converter. This was abandoned because we already use subclasses to distinguish between loading from a file (FileTemplate
) or a literal string (StringTemplate
). Using subclasses means we’ll need two subclasses for every conversion, one for loading from file and one for string, which is more confusing and harder to maintain. As an alternative, we can provide aliases functions for users to import, such as:
from airflow.templates.converters import isodate
StringTemplate("{{ ds }}", converter=isodate)
Where isodate
is simply an alias to datetime.date.fromisoformat
. Custom converters can also be put in the same module.