Motivation

This is an idea that I've used in the past for my own work, seen in this PR (https://github.com/apache/airflow/pull/3526), and now see again in Ash's user survey (https://ash.berlintaylor.com/writings/2019/02/airflow-user-survey-2019/):

More Operators: 11 comments

Requests for more operators/sensors. One good request was to have “composable” operators to explosion of XtoY operators. Ed: this would be nice! If someone wants to start an Airflow Improvement Proposal for this that would be ace.

So, I created this AIP because I think it would be great to have this in Airflow.

Considerations

To prevent a ton of A-to-B operators, we could create hooks which are accessible via a common interface. This allows for interchangeable operators, e.g. CopyOperator which takes 2 of such hooks to copy from system A using hook A to system B using hook B. PR https://github.com/apache/airflow/pull/3526 already created a collection of filesystem hooks using Python's file object API. This is explained in more detail in the corresponding JIRA ticket: .

The result is a few filesystem hooks:

  • ftp
  • hdfs
  • local
  • s3
  • sftp

And a few operators which accept any hook adhering to the file object interface, e.g. (not included in PR):

  • CopyFileOperator
  • DeleteFileOperator
  • CopyTreeOperator

This is specific to filesystems, but I think this idea can be extended to a.o. databases using the Python DB API.

Since the PR above went stale for reasons, before somebody puts a lot of time & effort into this, I think it would be wise to discuss if this is desirable. And if so, if the work from PR 3526 can be used as a basis or if large changes are required.