DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Motivation
A key set of capabilities which has made Airflow successful has been its large set of community supported integrations which number over 1,600 (hooks, operators, and sensors), released as part of 90+ provider packages of varying sizes. For example, the AWS and Google providers have around 80 components (hooks, operators, and sensors) each.
Currently, though integrations are critical, as a community, we are not welcoming for new integrations even when service providers or others want to contribute them. In fact, we say “no” far more often than not. We need to balance between technical maintenance challenges and breadth of integrations needed for “ubiquitous orchestration”.
In addition to new integrations, even the existing integrations such as Snowflake and Azure have serious limitations because of a lack of consistency and maintenance. This has been a sore point with users, with these being specifically called out as needing consistent connection management and easier testing. There have also been numerous customer escalations regarding integrations, most recently with Snowflake. This problem could become an existential challenge for Airflow.
The goal of this proposal is to add a method by which we can grow the number of integrations to Airflow while not overloading the PMC and Release Manager. The governance of all of these continues to remain with the Airflow PMC.
Updated Governance
This proposal follows the three stage (Incubation, Production, and Deprecation) policy, but establishes stronger procedures, metrics, and communication around these, for more scalable governance. We currently have a process for provider adoption where the operational responsibility for the provider moves into either an entirely Airflow community owned model or a shared responsibility model.
All new providers or integration components should start at the Incubation stage, unless specifically accelerated by the PMC and devlist. Each module must have a “steward” responsible for making sure that the criteria below is being met. The steward has to be at least two unique individuals. The role of the steward is the subject matter expert for the integration be it a service or a language or a skill (such as a large language model). The stewardship role is a responsibility, but does not come with any additional authority or privileges. The accountability is still with the Airflow PMC and Committers. In practice, this will be by requiring that each Steward is sponsored by an existing Airflow Committer. It is the responsibility of the sponsoring Airflow committer to ensure that the Stewards are fulfilling their responsibilities, including asking for help as needed. Recognizing that life happens, neither the sponsorship nor stewardship are roles in perpetuity, but can be transitioned to others based on mutual agreement and with approval from the Airflow PMC.
The quantitative criteria described below is aspirational at this time. This will be revisited by the PMC based on actual experience 6 months after these have been established and published.
Incubation
All new providers or other integration modules (such as Notifiers, Message components, etc.) should start in incubation (unless specifically accelerated by the PMC) to ensure that code, licenses, and processes align with ASF guidelines and community principles such as community, collaborative development, and openness.
Modules must have a working codebase to bootstrap credibility and attract contributions. Each of these modules must be visible in the dashboard (detailed below) for module health.
- Quantitative graduation criteria can include:
- PRs submitted: Minimum of 10 PRs in the last 6 months
- PRs reviewed: PRs being reviewed within 14 days
- Issues reported: Minimum of 15 unique issues filed in the last 6 months
- Contributions: At least 3 unique individuals making contributions (code or documentation) in the last 6 months
- Issue resolution rate: At least 50% of reported issues closed within 90 days.
- All release and security related issues closed within 60 days
- Demonstrated participation in project governance channels including quarterly updates
- Meet quality criteria for code, documentation & tests listed in Contributing Guide
Production
All modules in production are expected to be well managed including prompt resolution of issues, and up to date in support of consistent release cadence of at least monthly, but more likely every 2 weeks (when it has new changes). These modules are expected to stay consistent with the main and they need to pass tests for main + all supported airflow versions.. Airflow support guidelines including staying current with main Airflow releases.
Exceptions can be granted based on a PMC / devlist vote (PMC members only having binding votes) , for valid and one-off criteria.
Quantitative criteria to maintain production status:
- PRs submitted: Minimum of 10 PRs in the last 6 months
- PRs reviewed: PRs being reviewed within 14 days
- Issues reported: Minimum of 20 unique issues filed in the last 6 months
- Contributions: At least 5 unique individuals making contributions (code or documentation) in the last 6 months
- Issue resolution rate: At least 60% of reported issues closed within 90 days.
- All release and security related issues closed within 30 days
- Feature release cadence: At least 1 feature release every 6 months.
- User engagement: Maintain support activity with response to questions within 2 weeks on average.
- Demonstrated participation in project governance channels including quarterly updates
Attic / Deprecation
Modules should be moved into the Attic when relevance wanes, usually measured by activity. Typically this is because the solution to be integrated has faded in popularity and is replaced by a more up to date solution. An example is databases, where the database integrations have to be updated over time.
These should be communicated on the dev list and voted on by the PMC. Again, exceptions can be granted based on the vote.
Quantitative criteria to move to the attic:
- PRs submitted: Fewer than 5 PRs in the last 6 months
- PRs reviewed: PRs not being reviewed in more than a month.
- Issues reported: Fewer than 10 unique issues filed in the last 6 months
- Contributions: Fewer than 3 unique individuals making contributions (code or documentation) in the last 6 months
- Issue resolution rate: Less than 30% of reported issues closed within 90 days.
- Release and security related issues not getting closed within 30 days
- Feature release cadence: No feature releases in the last 6 months.
Modules in the attic remain readable, but do not receive any active maintenance.
After a period of time generally at least 6 months after deprecation, these modules can be chosen to be removed, again accompanied with appropriate communication.
Mature
There could be exceptions to the above described situation, such as a Slack provider integration, which may have a very stable interface requiring almost zero changes on a regular basis.
To handle these situations, certain providers at the discretion of the Airflow PMC will be tagged as "mature providers", which will not automatically be deprecated and moved into the attic, as a result of a lack of activity.
Periodic reviews
The Airflow PMC is responsible for reviewing the health status of these integrations on a quarterly basis and making decisions such as changing the status of an integration (moving into production or into attic).
These discussions will be held in public and subsequently will be summarized and shared on the Airflow devlist.
Updated technical procedures
A lot of these technical procedures build on items already in flight from a CI and build infrastructure standpoint.
Repository
Move all the community providers and integrations into a separate repo from Core Airflow, but within the overall Airflow project.
This specifically excludes the following elements within the Core Airflow repo:
- Airflow server components i.e. Core Airflow
- Task SDK
- Standard Providers
- Executors
- Common integrations i.e. common abstractions over individual integrations
Other providers and integration elements should live outside of Core Airflow into this new integration repository, which is governed by practices described above.
This repo should definitely include:
- Database providers (implementation of common.sql)
- IO providers (implementation of common.io)
- Message bus integrations (for SQS, Kafka, etc. - implementations of common.messaging)
- All other community providers
- Cloudera, Teradata, and all other providers.
Based on feedback, this change to the mono-repo i.e. to split into multiple repositories within the Airflow project is deferred into future possibilities.
Consistency
We should double down on the “common” patterns for integration for consistency of DAG author development as well as ease of maintenance and testing.
AI assisted tooling in the form of well crafted prompts to enable accelerated development of new “provider integrations” for standard integration patterns such as databases, messaging systems, and IO.
These should be extendable for AI models and AI services, going forward.
We can and probably should take a stand that integrations conforming to these standard patterns will be faster to adopt and be certified as compared to those that don’t.
Distribution
We suggest a simple evolution of the existing Provider release process here, since these providers are outside of core Airflow and in a separate repo.
It is feasible for some integrations to grow to be very large bodies of code, such as the Amazon and Google providers today, and other groups over time. These could become separate integration repositories, with a “team being a steward” for that particular integration repository.
Based on this, it is probable that initially there will be 3 provider repositories:
- airflow-providers
- airflow-providers-amazon
- Airflow-providers-google
Based on feedback, this change to distribution and release process is deferred into future possibilities.
However, Github issues will at least initially continue to be reported on the main airflow repo, with the ability to move issues to one of the provider repos as appropriate.
Discovery
One of the key reasons for people wanting to contribute integrations into the “Airflow repo” is discovery. Not only should we encourage this in the form of a “Airflow integrations repo”, but there needs to be a simple UI to navigate the list of integrations.
Considering the current operating form, this should probably be:
- A Web UI which is dynamically generated based on the integrations in the repo, along with release history, version compatibility, popularity (based on issues / PRs), etc.
- A MCP interface for this to be findable and usable by AI agents
Dashboard
- Airflow based tooling to display the status of each integration / provider showing the above metrics
- Also send notifications to individual provider responsible individuals for maintenance tasks.
Future - possible extensions
Based on feedback, some of the original elements in the proposal, have been moved here as future possibilities to revisit if and when the situation demands it.
We had originally proposed simple evolution of the existing mono-repo and the release distribution as stated above, to split the existing mono-repo into multiple repositories and also split the providers release process. Taking into account, the feedback around this, we have decided to defer that into the future, based on the evolution of the project and until the time that the volume of contributions makes this necessary.