DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
| Page properties | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
This AIP is part of AIP-63, which aims to add DAG Versioning to Airflow.
Motivation
Today TaskInstance doesn’t maintain any history for prior tries of a task - when a retry starts, it resets everything about the prior try. This means one is unable to know even simple things about a prior try without looking at the task logs. This also means it’s impractical to show this level of detail in the UI for users.
Considerations
To start, we will take this opportunity to create a synthetic primary key on TaskInstance to make it easier to reference a specific task instance. We will also adjust the foreign key on TaskInstance to use the DagRun synthetic key (id) instead of the logical key (dag_id, run_id).
At a high level, Airflow will need to store the details of a TaskInstance try. We have a few options on how to achieve this, and we will determine the best approach as we roll up our sleeves during implementation. A few options include:
- Track the complete try details in another table, something like `task_instance_tries`. Optionally keep the logical key portion of the existing `task_instance` table in that table, with select denormalized columns (like `state`), and possibly a FK to the tries row.
- Add `try_number` to the logical key of `task_instance`. This might have query complexity concerns, however.
One thing to keep in mind is that every column on a task instance other than dag_id, task_id, run_id, and map_index can change between tries, so we have to keep all of them for a complete history.
A focal point will be in ensuring the query the scheduler uses to decide what tasks are ready remains performant. We will, however, consider other common access patterns as well, like ensuring the UI dashboard is still performant enough.
TaskInstance is a widely used entity in Airflow, so backward compatibility will be maintained for places where user code interacts directly with TIs (e.g. task context).
The UI will utilize the familiar try buttons in use today for task logs for the TI views:
The REST api endpoints will also be updated as appropriate. For example, the `dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}` endpoint will return the latest Task Instance. An additional ``dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/tries/{try_number}` endpoint will be added to allow retrieval of earlier Task Instances.
This AIP will be considered done when TaskInstance history is tracked, and basic UI/API/CLI functionality exposes that information to users.
