Status

StateDraft
Discussion Thread

https://lists.apache.org/thread/xgd66v6s7zf0xkvy3c7ysqvn4csgmw06

Vote Thread
Vote Result Thread
Progress Tracking (PR/GitHub Project/Issue Label)
Date Created

03.12.2024 13:40

Version Released
Authors

Motivation

Model Context Protocol (MCP) is an open standard, open-source framework that standardizes the way AI models like LLM integrate and share data with external tools, systems and data sources. One could think of it as a "USB-C for AI" - a universal connector that simplifies and standardizes AI integrations. A notable example of an MCP server is GitHub's official implementation, which allows LLMs such as Claude, Copilot, and OpenAI (or "MCP clients") to fetch pull request details, analyze code changes, and generate review summaries.

Considerations

What change do you propose to make?

WORK IN PROGRESS

We suggest to implement official MCP server and plugin.

The server and the plugin will be maintained on a seperate repository, as MCP is a different entity than a provider and does not require any of Airflow's internals.

What problem does it solve?

WORK IN PROGRESS

By officially supporting MCP, we enable users to interact with Airflow through LLMs, improving accessibility and automation across a wide range of workflows. This includes:

  • Debugging task failures using natural language, making troubleshooting easier—especially for non-technical users.

  • Supporting sparse scheduling to improve resource utilization.

  • Identifying opportunities for DAG code optimization.

  • Analyzing cross-DAG dependencies to improve pipeline reliability and maintainability.

  • Assisting with migration and refactoring planning to ease transitions between Airflow versions or deployments.

Why is it needed?

WORK IN PROGRESS

As Airflow usage scales across organizations, there's a growing need to simplify complex operations, improve developer productivity, and make the platform more accessible to non-technical stakeholders. MCP enables natural language interactions with Airflow, helping users perform tasks such as debugging, scheduling analysis, and DAG optimization more efficiently. By providing official support for MCP, we ensure a consistent and reliable experience while unlocking advanced capabilities like LLM-assisted refactoring, dependency analysis, and migration planning.

Are there any downsides to this change?

WORK IN PROGRESS

  • Since both the MCP server and plugin are maintained in a separate repository from Airflow's main repository, additional effort will be required to develop and maintain CI pipelines for them.
  • Integration tests for complex cases might be complicated to imlplement, and will require an integration with LLM (GitHub Copilot?) - might require assistance from Apache infra. team.
  • Supporting Airflow 2 might be challenging

Which users are affected by the change?

The change affects the following roles:

  • Operational users - can execute operations on Dags or their results using natural language. MCP-aided LLM capabilities can help bridge technical gaps for non-technical users.
  • Dag authors - same as operational users, with the addition of debugging technical issues or leveraging technical insights.
  • Deployment Manager - when MCP capabilities are enabled, they should be aware of how to deploy the MCP server and plugin.

How are users affected by the change? (e.g. DB upgrade required?)

Installing Airflow's MCP will be optional, at least in its initial development stage. Therefore, unless users explicitly install the plugin, it should not break existing deployments.

What is the level of migration effort (manual and automated) needed for the users to adapt to the breaking changes? (especially in context of Airflow 3)

N/A


Other considerations?

WORK IN PROGRESS


What defines this AIP as "done"?

WORK IN PROGRESS

  • A new repository is created under apache GitHub's organization, containing MCP server and MCP plugin.
    • Basic CI - linting, unit tests, release management, and if possible integration tests
  • Implementation of MCP Server
    • Implementation should solves at least one use-case (for example, debugging task failures).
    • Deployment using the official helm chart.
    • Deployment using breeze 
  • Implementation of MCP Plugin.