I propose to move all SQLAlchemy ORM classes to the folder where they logically belong to, instead of grouping by type. From a user perspective I think this makes more sense. A user doesn't know whether or not a class is an ORM model and IMO it would be more sensible to import from airflow.operators import BaseOperator instead of from airflow.models import BaseOperator.
Current situation: most SQLAlchemy ORM classes (e.g. DAG, Connection and TaskInstance) are now stored in /airflow/models (some are not e.g. BaseJob in /airflow/jobs).
Proposed situation: place SQLAlchemy ORM classes together with similar classes. There is no requirement to keep these in the same directory. Anything that inherits from Base can be placed anywhere. My suggestion for all files in /airflow/models:
- Base: create folder /airflow/db and move it there (together with all in utils in /airflow/utils/db.py)
- BaseOperator: this is actually not an ORM class, place in /airflow/operators
- Connection: not sure yet, either together with Variable & XCom or something separate
- Crypto: move to /airflow/security
- Dag: move to /airflow/dag
- DagBag: move to /airflow/dag
- DagPickle: move to /airflow/dag
- DagRun: move to /airflow/dag
- Errors: rename script to import_error.py and move to /airflow/dag
- Kubernetes: create folder /airflow/executors/kubernetes and move it there
- Log: create folder /airflow/logs and move it there (some logging-related utils could go there too)
- Pool: not sure yet
- SkipMixin: not sure yet, /airflow/ti_deps perhaps?
- SlaMiss: create folder /airflow/sla and move it there
- TaskFail: move to /airflow/task
- TaskInstance: move to /airflow/task
- TaskReschedule: move to /airflow/task
- Variable: not sure yet, either together with Connection & XCom or something separate
- XCom: not sure yet, either together with Connection & Variable or something separate
Is there anything special to consider about this AIP? Downsides? Difficultly in implementation or rollout etc?
What change do you propose to make?
Move /airflow/models classes to a logical module.
What problem does it solve?
It removes the need to know whether or not a class is an ORM model (i.e. look in /airflow/models) or not (i.e. look at sensible module), which IMO would make the class structure more sensible.
Why is it needed?
To make more sense of the Airflow codebase.
Are there any downsides to this change?
It introduces breaking changes and can only be added in Airflow 2.0.
Which users are affected by the change?
Everybody using a class in /airflow/models.
How are users affected by the change? (e.g. DB upgrade required?)
The change would come together with upgrading to Airflow 2.0.
What defines this AIP as "done"?
When /airflow/models does not exist anymore.