Title: High Level Design of Role Based Access Controller in SQOOP 2

JIRA : SQOOP-1834 and its sub tickets, SQOOP-2048 and its sub tickets



Sqoop 2 needs a pluggable role based access controller (RBAC), which is responsible for the authorization to Sqoop 2 resources, such as server, connector, link, job, etc.

Basic Idea


  • The access controller is pluggable
  • Set controller class in sqoop.properties
  • The default implement in Sqoop 2 could be a FAKE controller (always return true)
  • The access controller class could be implemented by other controller framework, such as Sentry
  • Connector

Resource, actions and rules

Server has three children: Connector, Link, Job.

  • It is a hierarchy mode. If a user has the privilege of {server, all}, then he/she has all privileges of {connector, all}, {link, all} and {job, all}.
  • If a user has the privilege of {job, all}, then he/she has both privileges of {job, read} and {job, write}.
  • If a user want to create a link, then he/she need to have the privilege of {server, create}
ResourceGlobal Namespace
  • All
  • Read
  • Write
  • All
  • Read
  • All
  • Read
  • Write
  • All
  • Read
  • Write
ActionPrivilege needed
show connector
  • connector read
show link
  • link read
create link
  • server create
  • connector read
update link
  • link write
  • connector read
delete link
  • link write
enable link
  • link write
disable link
  • link write
show job
  • job read
create job
  • both links read
update job
  • job write
  • both links read
delete job
  • job write
enable job
  • job write
disable job
  • job write
start job
  • job write
stop job
  • job write
show submission
  • job read


Authorization framework


  • Config in sqoop.properties

  • Four metadata classes.
    • Role
    • principal
      • This class defines user or group.
      • Type: user, group, role.
      • principal could be granted a role. i.e. if we want to grant a admin role to user hadoop, then grantRole (principal (name=hadoop, type=user), role (name=admin)).
    • Resource
      • This class defines four resources in Sqoop 2.
      • Type: server, connector, link, job.
    • Privilege
      • Action: all, read, write.
      • with_grant_option: boolean, defines whether the role could grant this privilege to other role.

  • Five classes will be added into Sqoop-core as org.apache.sqoop.security package.
    • AuthorizationManager
      • Similar with other Sqoop Manager, ie. ConnectorManager, RepositoryManager, etc., the AuthorizationManager handles two singleton instances, AuthorizationManager and AuthorizationHandler.
      • The initialize function is run when starting the Sqoop server
      • The initialize function will initial AuthorizationHandler, according to the handler name (DefaultAuthorizationhandler or SentryAuthorizationHandler) from configuration file (sqoop.properties).
    • AuthorizationHandlerFactory
      • It is a factory design mode.
      • It is to use ClassUtils.loadClass to refact the real AuthorizationHandler in getAuthorizationHandler function.
    • AuthorizationHandler
      • It is an abstract class.
      • There is a default implementation (DefaultAuthorizationHandler) in Sqoop-security component.
      • It handles two singleton instances, AccessController and AuthorizationValidator.
      • All function will be delegated to these two instances to handle. AccessController to handle grantRole, revokeRole, grantPrivilege and revokePrivilege. AuthorizationValidator to handle checkPrivilege.
    • AccessController
      • It is an abstract class.
      • There is a default implementation (DefaultAccessController) in Sqoop-security component.
      • This class is responsible to manage roles, privileges.
    • AuthorzationValidator
      • It is an abstract class.
      • There is a default implementation (DefaultAuthorizationValidator) in Sqoop-security component.
      • This class is responsible to check privileges.
  • Three classes will be added into Sqoop-security as org.apache.sqoop.security package.
    • DefaultAuthorizationHandler
      • This class extends abstract AuthorizationHandler.
      • It handles two singleton instances, DefaultAccessController and DefaultAuthorizationValidator.
    • DefaultAccessController
      • This class extends abstract AccessController.
    • Default AuthorzationValidator
      • This class extends abstract AuthorizationValidator.
      • As default/simple implementation, it always returns true and will not check the privilege actually.

  • All functions in RequestHandler, which handles all requests, ie. create link, will be added privilege validation check.
   * Create or Update link in repository.
   * @param ctx Context object
   * @return Validation bean object
  private JsonBean createUpdateLink(RequestContext ctx, boolean create) {
  • Privilege check request will be analyzed by AuthorizationEngine.
public void createLinkPrivilige() throws SqoopAccessControlException {
    List<Privilege> privileges;
    privileges.add(new Privilege(new Resource("Link", "1"), "Create", null));
    privileges.add(new Privilege(new Resource("Connector", "1"), "Read", null));
  • Privilege check will be passed to real AccessController from AuthorizationHandler.
public void checkPrivileges(List<principal> principals) throws SqoopAccessControlException {

  Command line tool


  • The grant/revoke privilege should be run in command line in Sqoop client
  • The commands are showed below

Create/Drop Role
CREATE ROLE role_name

DROP ROLE role_name

Grant/Revoke Roles

GRANT ROLE role_name [, role_name] ... TO principal_specification [, principal_specification] ...

REVOKE ROLE role_name [, role_name] ... FROM principal_specification [, principal_specification] ...
    USER user_name | GROUP group_name | ROLE role_name

Viewing Granted Roles

SHOW ROLE GRANT principal_specification

    USER user_name | GROUP group_name | ROLE role_name

Grant/Revoke Privileges

GRANT privilege_action_type [, privilege_action_type] ... ON resource [, resource] ... TO principal_specification [, principal_specification] ... [WITH GRANT OPTION]

REVOKE [GRANT OPTION FOR] privilege_action_type [, privilege_action_type] ... ON resource [, resource] ... FROM principal_specification [, principal_specification] ...

REVOKE ALL PRIVILEGES FROM principal_specification [, principal_specification] ...

    SERVER server_name | CONNECTOR connector_name | LINK link_name | JOB job_name
    USER user_name | GROUP group_name | ROLE role_name

Viewing Granted Privileges

SHOW GRANT principal_specification [ON resource]
    USER user_name | GROUP group_name | ROLE role_name
    SERVER server_name | CONNECTOR connector_name | LINK link_name | JOB job_name


  • Restful call API is handled by org.apache.sqoop.handler.AuthorizationEngine.java in sqoop-server
    • POST /authorization/roles/create
      • Create new role with {name}
    • DELETE /authorization/role/{role-name}

    • GET /authorization/roles
      • Show all roles
    • GET /authorization/principals?role_name={name}
      • Show all principals in role with {name}
    • GET /authorization/roles?principal_type={type}&principal_name={name}
      • Show all roles in principal with {name, type}
    • PUT /authorization/roles/grant
      • Grant a role to a user/group/role
      • PUT data of JsonObject role(name) and principal (name, type)
    • PUT /authorization/roles/revoke
      • Revoke a role to a user/group/role
      • PUT data of JsonObject role(name) and principal (name, type)
    • PUT /authorization/privileges/grant
      • Grant a privilege to a principal
      • PUT data of JsonObject principal(name, type) and privilege (resource-name, resource-type, action, with-grant-option)
    • PUT /authorization/privileges/revoke
      • Revoke a privilege to a principal
      • PUT data of JsonObject principal(name, type) and privilege (resource-name, resource-type, action, with-grant-option)
      • If privilege is null, then revoke all privileges for principal(name, type)
    • GET /authorization/privileges?principal_type={type}&principal_name={name}&resource_type={type}&resource_name={name}
      • Show all privileges in principal with {name, type} and resource with {resource-name, resource-type}
      • If resource is null, then show all privileges in principal with {name, type}

Sentry implementation

  • Sentry could be used as an alternative access controller
  • Config in sqoop.properties
  • Use Sentry to check access privilege
  • Set access privilege using hue (optional)

Database design

  • Role table
    • Id
    • Name
    • Comment
      • Role name could be admin, developer, user, etc.
  • Role_User_Group table
    • Id
    • Role_id
    • User_name
    • Group_name
    • Comment
      • The information of user and group comes from Linux or LDAP etc.
      • Only one of user name and group name is set. If user name is set and leave group name empty, it means that this user has this rule. If group name is set and leave user name empty, it means that all users in this group has this rule.
      • One user/group could have one or multiple roles.
  • Privilege table
    • Id
    • Role_id
    • Resource_id
    • Resource_type
    • Action_type
    • Comment
      • Resource type could be the existing resource table, such as connector, link, job, etc.
      • Resource type could be added in the future, say config etc.
      • If resource_id is 0, it means all resource of this type, ie. resource_id=0 and resource_type=link means all links.
      • Use resource id and resource type to identify the resource, ie. resource_id=1 and resource_type=link means the resource of “select * from link where id =1”.
      • Action type could be read, create, update, delete, use etc.
  • Accordingly, MRole, MRoleUserGroup and MPrivilege classes are added into package org.apache.sqoop.model.



  • No labels


  1. Not sure every design feature yet, but overall like the structure of this doc!

  2. About the "Resource and actions" section. In Link level, Create and Update are separated. In Job level, Create/Start are grouped. Update/Stop are grouped. Can you revise this part?

  3. Very nicely written proposal Richard! I do have few questions for the privilege model:

    • Are we planning to have the privileges hierarchical to some extent? For example does "use" means also "view"? (e.g. if you can use object, you obviously have privilege to see the object)
    • I'm thinking if we need separate enable/disable privilege or whether we will be fine with just one for both actions?
    • I do not recall "delete submission" as an action that is exposed on REST interface (smile) (other then "stop" that is covered separately)
    • I'm wondering if we indeed need the global privileges when they are used only for the "create" actions and always in addition if the user also have sufficient "use" privilege? (Based on the table with all operations and necessary privileges)
  4. Jarek Jarcec Cecho, thanks for your comment. Here are my answers:

    • There is no privilege hierarchical to some extent, due to the tight time frame. But it is good suggestion, we could put it into the improvement features.
    • I have updated the document to combine the enable and disable privileges into one.
    • As there is no "delete submission", I have removed the DELETE action in submission. However, there should be a "delete submission" in REST interface to clean the history or so. We could create a separate JIRA to handle this later.
    • I guess that global privileges is needed. Let's say, the admin has the global privilege to read all jobs. Then, whenever a new job is created, there is no need to grant a instance privilege to admin. It will be more complex logic in this case without global privileges.
  5. Some more notes:

    • Admin role initially created with users in config added to it? Or super users in configs? These seem "cross compatible" in the send that they achieve the same end goal. Super users are potentially easier to remove since it would be code managed by the Sqoop community versus some RBAC system.
    • Who should have access to the RBAC APIs? Perhaps more resources to control.
  6. Abraham Elmahrek, the admin role, which is granted all privileges, is set in sqoop.properties and will be initally created when starting Sqoop server. And there is a with_grant_option which indicates whether this user could grant his/her privilege to other users. All user have access to RBAC APIs, but they may not have privilege to grant role/privilege. The RBAC will check the principal before running the command.

  7. Thank you for answering all the questions richard! I have a few more notes in the discussion:

    I wasn't oppose to the global privileges originally, but as I'm following the discussion I'm starting to think that having those will unnecessary complicate the system. It seems that we want to have and admin user in the configuration (similar concept is already in Sentry and other systems so it make sense to me) - this user can change any privileges and already have the "global privileges". Which I think is further diminishing the value of global privileges (e.g. their "usability").

    Additional and smaller concern is that the global privileges will also complicate a bit admin's life as there are now two privileges for the same - e.g. I can grant user access to global links and specifically to one link object. I would expect that revoke on the link object should prevent the user to access it, but in this case as the user also have the global privilege it would still be accessible.

  8. I have a few more thoughts on the privilege model:

    1. Why not push "submission" privileges into "job" as "history", "start", and "stop"? The "submission" resource is an implicit resource that is controlled by Sqoop. Users aren't actually acting on them.
    2. I'm +1 on getting rid of "global" privileges and just providing a "sqoop" namespace that all instances are a part of. This way administrators can have access to every thing implicitly. Also, any user that can use a connector should be able to create a link with it. Likewise, any user that can use a pair of links, should be able to create a job with those links. The global privileges are really not necessary. It's something that can be added on later if we see fit.
    3. Copying should probably be restricted by "edit" privilege?
    4. I don't believe object hierarchy is necessary for this project largely because we don't have a hierarchical object model with single relationships. If we choose to add hierarchy, then it would likely only be for connector -> link. Since jobs can have multiple links, I don't think this makes sense. I'd say leave this out and see what users want and let that dictate our actions on hierarchy.
    1. Great points Abraham Elmahrek!

      1. Very good point, I agree that modeling submission privileges seems to a bit of a overkill. We can always add those privileges later if users will require them.
      2. Agreed on removing global privileges. I think that we should add a top level "server" entity then, so that we can still model "ALL privileges to ALL".
      3. Do you mean "clone" command in the shell? If so, then the "clone" functionality is fully driven on the shell side - we don't have a "clone" action on the server and hence we don't need a special privilege for that (smile)
      4. I've also chatted with Abraham Elmahrek offline and I do agree that we do not have a good hierarchy for our objects (Connector, Link, Job, Submission). I do however feel that we should still include the top level entity as suggested in 2).
  9. Thanks Abraham Elmahrek and Jarek Jarcec Cecho

    I have modified the page: remove "submission" type and add action "start_stop", "status" in "job" type.

    For action hierarchy, I have added a hierarchy map. Please help to review. Thanks.

    • The ALL privilege isn't needed in Connector, Link, and Job. Having WRITE permission should implicitly give you READ permission.
    • I don't think we need a CREATE privilege at all. Let's assume any one can create a link or job for now.
  10. Abraham Elmahrek, thanks for your comment. Here is my thought.

    • For the READ, WRITE privilege, I would prefer to separate it, similar with hive privilege model, instead of having the implicit relationship, because not all WRITE actions have a READ action as a pre-action, e.g. START job action, the end user could start a job without knowing any details of this job, maybe he/she only has the privilege of START job, and have no right to view the details of this job.
    • For the CREATE privilege, I would prefer to keep it, since it is a basic privilege. I would like to make it align with hive privilege model: If the end user has privilege of CREATE privilege on SERVER, then he/she could create jobs or links.

    So, I suggest that

    • Four privilege types: ALL, CREATE, READ, WRITE
      • No implicit relationship, and ALL means CREATE+READ+WRITE
    • Four resource types: SERVERCONNECTOR, LINK, JOB
      • SERVER is the parent and has three children (CONNECTORLINK and JOB).
      • Children have ALL READ and WRITE privileges (CONNECTOR has READ only), and SERVER has ALL, CREATE, READ and WRITE privileges.
      • If the end user has privilege of CREATE privilege on SERVER, then he/she could create jobs or links.
    1. Thank you for the nice summary richard. I think that I see and I would agree with all your points with the small exception of the CREATE privilege.

      Hive do have several layers of objects - Server -> Database -> Table where each of the objects can have CREATE privilege. In practice this is used to divide and conquer - global administrators give CREATE privilege on given database to certain subset of users, so that those users can do anything they need (they are full admins, but only within this database). Also in order to create an object, one don't need READ privilege on any other object - the semantics of CREATE is that you can create any children for current node.

      In our case, the situation is different - the CREATE make sense only on Server instance as we have only two levels of hierarchy. And as server is a singleton (there is only one server) you can't use this privilege to divide and conquer. You either have it for everything or for nothing - that pretty much overlaps with admin role that can do everything. Also we are effectively using READ privilege to distinguish whether user can create certain objects, so it seems that the CREATE is not adding much value.

      1. Thanks Jarek Jarcec Cecho for clarification. I want to confirm one thing. Maybe my misunderstanding.

        Does server only have ALL privilege?

        From my perspective, there are four types of action privileges on SERVER level: READ, WRITE, CREATE and ALL, which means that if user has READ privilege on SERVER, then he/she has the READ privilege of CONNECTOR, LINK and JOB. In this case, the SERVER privilege could be divided and conquered.

        If SERVER has ALL privilege only, why this? Why not make SERVER has READ, WRITE and CREATE?