Status

StateCompleted
Discussion Thread
Vote Threadhttps://lists.apache.org/thread/xdj33nr47ggw9rn0gwmj6xw03bqv202b
Vote Result Threadhttps://lists.apache.org/thread/2wqwvp6d2pyjbsgvzw6t68mh9f0x2bw9
Progress Tacking (PR/GitHub Project/Issue Label)https://github.com/orgs/apache/projects/276
Date Created

$action.dateFormatter.formatGivenString("yyyy-MM-dd", $content.getCreationDate())

Version Released
Authorsvincbeck

Motivation

Today, the user management is part of core Airflow. Users, roles and permissions are stored in the Airflow metastore and managed through Flask-AppBuilder (FAB). Any additional feature in the user management part of Airflow means modifying core Airflow and more importantly, verifying it fits everyone needs from individuals to teams within enterprises.

For context, this had been brought up in a discussion regarding multi tenancy. It had been suggested that, instead of adding new features in the user management part of Airflow (such as having tenants), to extract this part of core Airflow and move it to a new component in Airflow: the auth manager. The target is, like executors, to have a generic interface defining the common API/functions of all auth managers which they need to implement. This way, Airflow would offer a pluggable/extensible way to define and use the auth manager that suits user needs. As opposed to the generic interface, the different auth manager implementations are not part of core Airflow and reside, depending on the service used underneath, in their respective provider (e.g AWS, Google) if it exists or in a new provider if it does not.

Proposal

The proposal is to extract the whole user management part of Airflow outside of core Airflow and introduce the auth manager. The goal of the auth manager is to manage all features and resources related to users, roles and permissions. This way users could simply chose between a very minimalist/simple auth manager and a more advanced one with notion of groups/tenants. Everything under the FAB security manager as it exists today is extracted out from core Airflow and handled by the auth manager.

The auth manager interface (or base auth manager) is an interface each auth manager needs to inherit from. This interface defines the common API of a auth manager and is the only integration point with core Airflow. In other words, any action related to user management is done through classes inheriting from this interface.
Since it is impossible to forecast what feature/view each auth manager is going to offer, the “Security” tab in the nav bar will be configured by each auth manager.
Auth managers are “pluggable”, meaning you can swap them based on your installation needs. Airflow can only have one auth manager configured at a time; this is set by the auth_manager option in the [core] section of the configuration file.

Implementations

In order to explain more into details how auth managers work, I decided to take two different implementations of auth manager:

  • FAB auth manager. This auth manager offers the exact same features and experience as the current user management in Airflow. The implementation of this auth manager is part of this AIP.
  • KeyCloak auth manager. This auth managers leverages KeyCloak to manager users and roles. The implementation of this auth manager is not part of this AIP. I still decided to include diagrams and explanations about this auth manager in this AIP to increase clarity and understanding about the potential different implementations of auth manager.

Minimalist FAB auth manager (backward compatible)

The target of the FAB auth manager is to offer a backward compatible experience to the users. To put it simple, it moves the FAB security manager out of core Airflow to a new provider: the FAB provider. All the different pages are still served through the web server. The “Security” tab is configured to be as it is today. End users should see no difference between before auth managers and after.

KeyCloak auth manager

The target of the KeyCloak auth manager is to delegate the user management to KeyCloak. The whole user management part is delegated to KeyCloak and admins have to configure roles and permissions in KeyCloak directly. A new provider KeyCloak needs to be created and contain only the KeyCloak auth manager. This auth manager will not be part of this AIP.

Authentication flow

The authentication flow allows a user to log in Airflow. The flow follows the oauth 2.0 protocol.

When the authentication succeeds, a Flask session is created as it is today. This Flask session stores the user information about the user that is connected. Depending on the auth manager the kind of information stored can vary but, as an example, if the auth manager service used follows OpenID Connect (OIDC), the OIDC access token will be stored in the Flask session so it can be used anytime throughout the user session.

To simplify the example diagrams below, we consider the user is not logged in and the authentication on the backend side succeed.

FAB auth manager

FAB auth manager is different from the other auth managers. Instead of delegating the login experience to an external service, it includes and defines the login page within the manager. The page is still served through the web server. The goal is to have the login page as it is today.

KeyCloak auth manager

Authorization flow

The is_authorized API is the API each auth manager needs to implement to check whether the current user has permissions to make a specific action. This API provides as well some context about the action being made and the resource being accessed so that the auth manager can make some authorization decisions based on multiple parameters. This context needs to be extensible so that it is easy to add new information to it. You can find the schema of this context below. It is represented as JSON for readability purposes but since is_authorized is a classic Python class method, they will be regular Python parameters.

{
	"action": "POST|GET|PUT|DELETE",
	"resource-type": "<resource-type>",
	"resource-details": {
		"id": "<resource-id>",
		"tags": [<resource-tag-1>, <resource-tag-2>],
        <other-resource-specific-information>,
	},
}
  • action. Create (POST), Read (GET), Update (PUT) or Delete (DELETE) operation. 
  • resource-type. The type of resource being accessed. The different values of this parameter are the list of resources already defined in Airflow today (you can see this list in Security → Resources in Airflow UI).
  • resource-details (optional). This is an optional extensible object to provide more metadata about the resource being accessed
    • id (optional). The resource ID. If resource is "DAG", then this parameter relates to the DAG ID, if resource is "Variable", it relates to the Variable ID, etc...
    • tags (optional). List of tags associated to the resource. Only available for resources you can tag, e.g. DAGs.
    • This object is extensible, as such, other parameters can be added to this object to provide more information about the resource. e.g. "dag-folder", if resource is "DAG", specifies the DAG folder where the DAG is defined.

Examples:

Is the user authorized to create a new variable?
{
	"action": "POST",
	"resource-type": "Variable"
}
Is the user authorized to read/list variables?
{
	"action": "GET",
	"resource-type": "Variable"
}
Is the user authorized to read the variable "my-var-id"?
{
	"action": "GET",
	"resource-type": "Variable",
	"resource-details": {
		"id": "my-var-id",
	},
}
Is the user authorized to delete the specific DAG "my-dag-id"?
{
	"action": "DELETE",
	"resource-type": "DAG",
	"resource-details": {
		"id": "my-dag-id",
		"tags": ["example1", "example2"],
		"dag-folder": "/dags/marketing",
	},
}

In order to understand how this API is implemented in different auth managers, let’s take the use case of “User clicks on Variables in the Admin menu”.

FAB auth manager

The is_authorized API in the FAB auth manager checks if the current user has the specified permissions. The implementation is very close to check_authorization in the security manager.

KeyCloak auth manager

Airflow Rest API

As part of the Rest API, some resources are no longer managed by core Airflow but by auth managers: roles and users. Therefore, these APIs will be removed:

However, some auth managers might need to define additional Rest API for their own needs. FAB auth manager is an example, in order to be backward compatible, the APIs listed above that are removed from core Airflow need to be redefined/moved from core Airflow to FAB auth manager. By default, no additional Rest API is defined in the base auth manager.

Airflow CLI

Among the sub-commands exposed by Airflow CLI, roles and users, similarly to the Rest API, need to be removed from core Airflow. Like the Rest API, some auth managers might need to define additional CLI commands (e.g. FAB auth manager).

UI

The different UI pages used to manage users and roles are no longer part of Core Airflow and moved to auth managers. Depending on the auth manager and its service/tool used underneath, two options are possibles:

  • Use the UI provided by the service/tool directly to manage users and roles. This is the preferred option.
  • Create UI pages in the auth manager to manage users and roles. This is the option chosen for the FAB auth manager.

Even though the preferred option is to delegate entirely the user management to auth managers, the second option is necessary to implement the FAB auth manager.

Auth manager API

All auth managers have a common API defined in the auth manager interface. You can find in the table below the common API needed from all auth managers. The different categories are just for documentation and grouping purposes but might not be reflected in the architecture/code.

CategoryNameDescription
UIget_url_user_profile()Returns URL to access user profile
get_user_name()Returns the user name
Coreget_url_login()Returns URL to sign in
get_url_logout()Returns URL to sign out
is_logged_in()Return true if the current user is logged in
login_callback()Callback called after login. It might be needed depending on the auth manager used. e.g. Storing the OIDC access token
is_authorized()Is the user authorized to make an action on a given resource. See section "Authorization API" for more details
get_security_manager_override_class()Specific an override for the security manager. Depending on the auth manager, you might need to override the security manager to add custom logic (e.g. register specific views)
Additional resourcesrest_apis()Define additional Rest APIs
cli_commands()Define additional CLI commands

Future work

Here are some examples of task that are not part of the AIP but can be done as follow-up once the AIP is completed.

  • Create KeyCloak provider and KeyCloak auth manager within it
  • Additional providers (e.g. AWS auth manager, Google auth manager)

Considerations

What problem does it solve?

It makes user management component of Airflow pluggable and extensible by introducing an auth manager interface in the core Airflow that can be extended by any provider package who want to support user management natively. An extensible and pluggable user management would open up the potential for a more advanced user management features than there is today in Airflow such as group of users (or tenants). 

Why is it needed?

Having a user management which fits everyone needs (from individuals to teams within enterprises) is impossible. Users need to have an extensible and pluggable way to use and define the user management they want.

Native user management support in cloud providers means that roles can be mapped directed to identity provider.  Currently, Airflow operators have to work around this and are unable to provide seamless RBAC in Airflow when running it on different cloud platforms.

Which users are affected by the change?

All users are impacted by the change. Though, by default Airflow would use the FAB auth manager that is backward compatible and users should not see any difference. Of course, if an admin decides to change the auth manager to use another one, then the whole user management experience of the environment would change.

What defines this AIP as "done"?

  • The auth manager interface defined
  • New provider FAB provider created
  • FAB auth manager inhering from the auth manager interface defined. This FAB auth manager is part of the new provider: FAB provider. By default Airflow uses this auth manager

17 Comments

  1. Hello Vincent BECK Great work to write this AIP, it was very clear to me (smile)

    Two comments:

    • Why naming "resource" instead of "resource-type", if it is indeed a resource type ? Then resource-details could be renamed to resource for instance

      {
          "action": "POST|GET|PUT|DELETE",
          "resource": "<resource-type>",
          "resource-details": {
              "id": "<resource-id>",
              "tags": [<resource-tag-1>, <resource-tag-2>],
              <other-resource-specific-information>,
          },
      }

    • Since FAB Security manager will move to its own provider, should we have wrapper methods for all Airflow Metastore DB calls, in preparation to AIP-44 and make the changes simpler when it is released ?
    1. Thank Philippe Lanoe

      • Funny, I first named it "resource-type" then changed my mind to "resource". I am honestly fine with either of them. But I think you are right, if this is the resource type, we should call it "resource-type". About "resource-details" though, I still rather name it "resource-details" instead of "resource" because, to me, it makes it more clear this is an optional object to provide more details about the resource.
      • Very good question! Because we are moving files from core Airflow to providers we should no longer "trust" these components and treat them as non-trusted components? That's a very good callout. I think you're right. However, I would do this work as part of AIP-56 and not AIP-44. We'll definitely use the decorator created as part of AIP-44 to access the database via the internal API but, I think, migrating the methods should be done as part of AIP-56
        • Ok then let's rename it to resource-type (smile)
          Well naming "resource-details" because it is "clear" that it is optional is subjective (smile) Ideally whether or not a field is optional should be defined by API Specs/docs not relying simply on naming. I am not fundamentally against resource-details, it is not clear yet what will be part of it: real resource information or more like metadata? When I see fields like "id" it made me think that it should be mapped to a resource concept, but as I mentioned, it is my perception and I do not have any objection against your proposed naming
        • My point was not to add work to AIP-44 but to prepare AIP-56 in view of AIP-44. So instead of directly querying the DB, having a wrapper method for the desired operation, then implement it with DB access for now and switch easily to the API logic when in place, without touching to the "core" AIP-56 code logic (because all changes would be located in these wrapper methods). But maybe it is over-engineering, only a suggestion.
          • Done (smile)
          • I am not sure I understand what you mean by "having a wrapper method for the desired operation". In AIP-44, there is a decorator "@internal_api_call" used to decorate methods that are calling the DB. Depending on the component source calling this method (scheduler, worker, ...), it either call directly the DB (trusted components) or query the internal DB API as a proxy to access the DB (untrusted components). Is this decorator the wrapper you are referring to?
          1. Philippe Lanoe - this PR (https://github.com/apache/airflow/pull/27892) might help you understand the role of "@internal_api_call" decorator that Vincent referenced. 

            1. Thanks Shubham Mehta 

              Vincent BECKlet me summarize our private conversation:

              As I mentioned above, we should rely on abstract wrapper method to access DB information, like "get_user", "update_role". It goes along with SOLID principles and will have the major benefit that 1. we can leverage AIP-44 to make AIP-56 as part of untrusted components and 2. we can change the wrapper method implementation as we wish without impacting the code logic (we can use the @internal_api_call decorator or use the logic that Ash Berlin-Taylor pointed out in AIP-44). These wrapper methods could actually lie in the internal API client, so that it is completely abstracted from the feature perspective and makes the code lighter there

              One question remain for me:

              • Why doesn't AIP-56 ensure that DB should not be accessed from untrusted components when DB_ISOLATION is enabled ? if a user can mess with the DB, doesn't it make a new solid auth mechanism pointless? I understand that it is already the case as of today, but probably it should be a pre-requisite to AIP-56 ? It looks strange to me that a company would invest in serious corporate identity management system integration if it can be easily worked around in the end.


              1. I think we should not mixed up both topics:

                • AIP-44's purpose is to moderate DB access to untrusted components. Untrusted components need to have access to the database, as an example workers are untrusted components but they need to access the DB to fetch variables, Xcom, connections, ... which are needed to execute DAGs. On the other side, we dont want to give them full access so that users can do whatever they want in their DAGs. It is a security breach. That's why we introduced this new internal API which is basically a proxy on top of the DB to moderate DB calls from untrusted components
                • AIP-56's purpose is to move out user management (authz included) from core Airflow to providers. These auth managers will be responsible of authz and as such will be able to tell, among other things, if the user has permissions to access a given DAG. Both problems are different and we need to address both. AIP-56 will tell you if a user has a access to a DAG and AIP-44 (or actually most likely a follow-up PR of AIP-44) will prevent the user's DAG to have full access to the DB but only a limited access


              2. I agree on the abstract wrapper, we can easily create an interface/facade between the auth manager and the actual methods which are fetching the data from the DB. That would indeed make things cleaner and easier if we decide to change the way we are fetching data. IMO, it is not worth mentioning it in the AIP since it is more implementation details to me

              3. I am also not sure we should consider auth managers as untrusted. The code will not be part of core Airflow but will still be in Airflow codebase. This will not be user code. 

                • Why doesn't AIP-56 ensure that DB should not be accessed from untrusted components when DB_ISOLATION is enabled ? if a user can mess with the DB, doesn't it make a new solid auth mechanism pointless? I understand that it is already the case as of today, but probably it should be a pre-requisite to AIP-56 ? It looks strange to me that a company would invest in serious corporate identity management system integration if it can be easily worked around in the end.

                Actuallly AIP-56 does mean "no dB access". I am not sure what gave you the impression Philippe Lanoe but when DB_ISOLATION is enabled, Airflow untrusted components should not need access to DB at all (and they should go away without having access at all). And I think AIP-56 scope is pretty different from AIP-44, while they are eventually become part of the "mutlti-tenant" Airflow, they are rather unrelated. Internal API will not be authenticated using the mechanisms described here - those mechanisms here are for "user" access only. AIP-44 will use the mechanism that we already use for example to retrieve logs by webserver, where the internal API call will authenticate using short lived token generated for the untrusted component (details still to be worked out). 

  2. Vincent BECK I did not mean that untrusted component should not access the DB, I just said that they should not access directly the DB at any point in time. If your last sentence is already implemented ("AIP-56 will tell you if a user has a access to a DAG and AIP-44 (or actually most likely a follow-up PR of AIP-44) will prevent the user's DAG to have full access to the DB but only a limited access"), then it is what I am saying and we are on the same page. However I still understood somehow that it is not fully the case as of today and some untrusted parts might still be able to directly access the database.

    Jarek Potiuk The difference between AIP-44 and 56 are clear to me. AIP-56 is about user access management for untrusted components as you mention. The reason why I say it is linked is that if their is still a ( simple) backdoor allowing the users to do whatever they want in the DB (through DAGs for instance), then it defeats the purpose of having a rock solid auth management feature. This should be sanitized first in my opinion, I just mentioned that it should be part of the AIP-56 effort (but does not have to be part of AIP-56 directly, but then it should be a dependency to AIP-56). For instance it would probably make few sense to enable any custom auth manager (except than the default FAB one for backwards compatibility) in AIP-56 if DB_ISOLATION is not enabled (and/or DB can still be directly accessed by users).

  3. I do not agree it makes no sense. Those two efforts are on-going, and they are correlated but I do not agree AIP-56 depends on AIP-44. You likely have a bit miconception about the current and future security models of Airlow so let me explain where we are and where we are going.

    This is the current model of airlfow (explained here: https://github.com/apache/airflow/blob/main/.github/SECURITY.rst#security-model ) . The security model of Airflow assumes two types of users:

    • UI users (mostly people who do operations)
    • DAG Authors

    AIP-56 is exclusively about UI users. 

    DAG Authors (currently and in a foreseeable future until AIP-44 gets implemented) are super-admins. They can do everything and even drop the whole database of Airflow. We need to fully trust them they are not evil. This is the current model of Airflow and AIP-56 does not change it. It is a non-goal for AIP-56 to change the security model of Airfow. The goal is to move management of the UI user authentication and authorization outside of Airflow core / FAB. AIP-56 is only about the UI users and it has exactly "0" impact on DAG Authors. This is absolutely intended and it should be like that. Adding DAG Authors' capabilities to the mix of AIP-56 makes no sense, because those are completely different users and their access to write DAGs is controlled by completely different mechanism.

    For DAG authors we are (it's a paradox in a way) in a much better situation (and we do not need AIP-56 equivalent) because Airflow already delegates access to those users to the deployment. Nameley DAG author's access is controlled by whatever access mechanism is provided by the mechanism of distributing DAG and it is alreaady not a concern for Airflow product. For example this can be controlled by git access rights, or S3 access rights or NFS access rights - and it is fully in the hands of those who deploy Airflow to choose the right mechanism. Airflow is completely unaware about that. As a deployed software components of Airflow, we do not care how the access is managed. And this is what we plan to do with AIP-56 for the UI users. 

    AIP-56 goal is about doing the same for UI users - delegating access for UI users to those who manage deployment. We decided that we do not want to keep it in Airflow - we think managing access to Airlfow UI is not something Airflow "product" should do - so with AIP-56 we delegate that part to whowever can do it better. IAM in AWS or Google, Open-source Keycloak (with whatever OIDC/SAML/LDAP whatever provides). The ones who are deploing Airflow will decide which users have access to which dags for example. This can be done based on LDAP group, or IAM project or whatever the deployment people choose. Airflow wil be (again) completely unaware about it.

    On the other hand AIP-44 goal IS to change the security model of airflow when DB isolation is enabled. It will decrease the "DAG Author" capabilities but limiting what actions they are allowed to do in the Airflow DB. They still have full access in the containers/pods/workers they are running, but the direct DB acceess in DB isolation mode will not be possible, for the DB access they will be limited to only the actions that will be specifically listed as allowed and in the next step further limited to only  entities that the specific task should be able to do - providing Data Access layer for the DAG authors. And only for them - not for UI users. DAG author access and UI access are two completely unrelated mechanisms that we have no immediate plans to connect.

    And the thing is that it opens up full multi-tenancy that will be truly "agnostic" of whatever access mechanims those who manage Airflow deployment can set up. For example if AWS team will choose the same IAM groups to control the DAG write access as the DAG UI access, as well as have a way to control where the DAGs are executed the same way (this AIP is yet to be written how to delegate that to outside entitites)  - this will achieve true multi-tenancy.


    1. First of all thanks a lot for your detailed explanation, it was really useful.

      Three comments:

      • Since we delegate the DAG Authoring access to upstream systems (NFS, Git etc.), it is assumed that the Admins are meant to be in control of these DAG sharing framework and are the decision makers on which DAG can go in (Git PR, NFS copy). Fine by me, just wanted to make sure I understood it correctly.
      • Why do we limit this to UI users and not to the Airflow API (so to any clients, including Airflow CLI)? If I understood, the UI relies on Airflow API? It would be strange that a user can do an operation from the CLI but would be forbidden on the UI. If it is only limited to the UI it could have some implication like DAGs can execute Airflow CLI bash command through a BashOperator on the worker node - But I assume it would be OK because "Admins/DAG Authors" would have approved it since they would have uploaded the DAG.
      • I will follow up on AIP-44 regarding DAG Authoring access to keep this thread focused on AIP-56
      1. > Since we delegate the DAG Authoring access to upstream systems (NFS, Git etc.), it is assumed that the Admins are meant to be in control of these DAG sharing framework and are the decision makers on which DAG can go in (Git PR, NFS copy). Fine by me, just wanted to make sure I understood it correctly.

        I am glad.

        > Why do we limit this to UI users and not to the Airflow API (so to any clients, including Airflow CLI)? If I understood, the UI relies on Airflow API? It would be strange that a user can do an operation from the CLI but would be forbidden on the UI. If it is only limited to the UI it could have some implication like DAGs can execute Airflow CLI bash command through a BashOperator on the worker node - But I assume it would be OK because "Admins/DAG Authors" would have approved it since they would have uploaded the DAG.

        I think we have made some wrong assumption on how Airlfow CLI works. I suggest you to review it first. 

        First of all API users = UI users. Those are the same. And when I wrote UI users, it also appliess to any API calls.

        But Airlfow CLI is NOT a client. IT does NOT use API.  Airlfow CLI is just a set of management scripts that allows you run certain commands, providing that you are doing it where airflow component is installed and it has no uers, no authorisation whatsoever. It uses the same access credentials the component it is run next to has. Airflow CLI command command executed through bash operator if that worker has no DB access will not succeed. The fact that you have a code around that you can use but no credentials that you need to connect to the database, does not make it possible for you to do those operations. 

        1. Apologies for the miss on the CLI.

          AIP-56 goal is about doing the same for UI users - delegating access for UI users to those who manage deployment.

          I was confused with the wording of "UI user", because from what I read in the code base, authorization is handled at API level, which is part of the webserver component. UI (graphical interface) is only one of the "client" calling the API. I could call it with curl or any other client code, which is not "UI". But we have discussed and we are on the same page that it is the same concept, all good here (smile)

          I will do more investigation on the CLI because and will re-post if I find a concrete security concern regarding this AIP.

  4. And just to correct one other misconception:  "AIP-56 is about user access management for untrusted components as you mention." - this is wrong. AIP-56 is not about controlling access for untrusted components. It is about managing access for users of the UI component (which is a trusted component) ONLY. AIP-56 is not aware and not concerned about untrusted components at all.

    1. Sorry for this statement, it was incorrect. It is clear now