Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Use Cases and Motivations

 

1.

...

SQL “DROP TABLE/DATABASE” command would like to have all the privileges directly on the table/database to be deleted automatically from Ranger. 

 

Similarly, SQL “RENAME TABLE” command would like to have all the privileges directly on the table to be changed to the renamed table.

 

One more use case is SQL drop/rename column commands would like to see the privileges directly on the column(s) to be adjusted accordingly. 

 

1.2  Hive CLI Users contacts directly with or through the Hive Metastore Server

HIVE supports metastore authorization for this use case (HIVE-3705), but it does not work through the existing Ranger Hive Plugin that works only for HiveServer2 now. 

 

1.3   Hadoop Users of M/R, Pig, Hive CLI want to access data sets created by HiveServer2

Specifically, HiveServer2 users want to enjoy the service by the HiveServer2 as a SQL data source with SQL-flavored access control on finer granular objects such as columns, among other advantages from a SQL server.  Currently HiveServer2 supports two modes of authorization. The first is “storage based authorization” and the second is  “SQL Standard based authorization”.

 

The first mode is the default and is intended to share the data between Hive and other Hadoop applications. But the downside is that the Hive SQL access privileges have to be used in combination with those of the underlying HDFS privileges; which is not convenient and natural to SQL users.

 

The second mode is enabled by setting the “impersonate” flag to false, and is intended to provide the access controls the same as a SQL user would enjoy.  This is realized through a “superuser” named “hive” who has the full access to the Hive tables. The downside is that the data sharing with other Hadoop application is virtually none.

 

So it is hoped that there is a seamless way of controlling the access to, and supporting the sharing of, the Hadoop data between Hive and other Hadoop applications. 

1.4 JIRA

1  Ranger Hive Policy Updates as result of SQL DDL

a)    The “Create Table/Database” commands of Hive support the Hive “automatic grants” to the newly created tables/databases;

b)   The “Alter Table … Rename to …” command support the change of the table name as a Ranger policy resource;

c)    The “Drop Table/Database” commands supports the removal of the Ranger policies on the table/database to be dropped.

1.2  Hive CLI Access Control through Ranger 

a)   Hive CLI operations are subject to access controls by Ranger as those from a Hive beeline session to a HiveServer2 server;

b)   Hive CLI grant/revoke commands are supported against Ranger as those from a Hive beeline session to a HiveServer2 server.

 

1.3 JIRA

Ranger-xxx Ranger-768 (https://issues.apache.org/jira/browse/RANGER-768xxx)

 

2. Functionalities

2.

...

It is required that the user of “hive” be a Ranger admin user to allow him the access to manipulate HDFS privileges (See Section 2.2.3). Otherwise the system authentication and authorization are the same as of now. 

 

2.2 Hive

2.2.1 Meta Store Plugin and Listeners

This is a new Ranger plugin. It uses the same Hive service name as the existing Hive plugin does to communicate with the Ranger Admin server and is co-enabled with the existing Ranger Hive Plugin through the same enabling script of  

enable-hive-plugin.sh.

 

The new metastore plugin will be used as a static instance by two Hive metastore listener classes to communicate with the Hive service in the Ranger Admin. During the “enabling” process, the two listeners will be added to hive-site.xml to be instantiated by Hive.  And the two listeners can optionally enable logging.

 

The first Hive metastore listener class extends the Hive’s MetaStorePreEventListener abstract class to provide 1) Ranger-based authorization on the Hive metastore.  Specifically all DML requests, and query requests on databases and tables, are to be authorized this way. But query requests on finer granular levels such as columns or partitions won’t be checked here and instead will be checked by the normal RangerHiverAuthorizer that uses the existing RangerHivePlugin for authorization against the Ranger Admin. And 2) handling of the possible needs to sync proper privileges to the HDFS files underlying a Hive table. Details are in Section 2.2.3.  An object of this class will listen on all Hive metastore events.

 

 The second Hive metastore listener class extends the Hive’s MetaStoreEventListener abstract class to handle the adjustments of Ranger Hive privileges as result of DDL operations. Details are in Section 2.2.2.

 

The new plugin will extend from RangerBasePlugin, handling the authorization requests as therein.  It will also send the new requests for the HDFS privilege sync to the Ranger Admin.

2.2.2 Range Hive Privilege Adjustments as Result of Hive DDL Operations

HIVE SQL DDL operations that add/remove/change a HDFS resource name will see the Ranger policy on the exactly matched resource to be added/removed/changed accordingly.  Failure of such adjustments will not cause the operation to fail, but just to log a warning of the failure.  An example of such a failure is a “rename” operation that finds an existing policy already on the renamed resource. This is possible because Ranger policy could be on nonexistent objects while SQL does not allow such a scenario.

2.2.3     Range HDFS Privilege Changes as Result of Hive Metadata Changes

There will be a new String member introduced in the “configs” list of the Hive’s servicedef json file, named “resourceService” that will specify the HDFS service name whose HDFS entries under a Hive table will have access policies added/deleted according to the existence of the Hive table’s objects of data.  The default value of null will disable the sync of the HDFS privilege sync due to Hive metadata changes. The setting of this member will be through GUI and RESTful API.

 

To enable the sync of the HDFS privilege due to Hive metadata changes, the proper setting of this new member plus the listener class configuration as described in section 2.2.1 are both required.

 

There are four parts of the functionality.

 

The first part is to handle HDFS policy changes as result of Hive DDL operations. This includes any HDFS location creation/deletion from SQL operations of table/partition creation, alteration and deletion. The policy will be for the login user on the HDFS directories on the object’s storage location recursively if the login user is different from the current user. The handling is by the new implementation of the MetaStorePreEventListener.

 

The second part is to adjust corresponding HDFS policies to reflect the privilege changes as result of SQL’s GRANT/REVOKE calls if such a policy is not present already for GRANT or is present already for REVOKE, and if Hive is not impersonated. The handling is through enhancements to the grant/revokePriveleges methods of the existing RangerHiveAuthorizer class. The name of a Hive-synced HDFS policy will be of the form of hive-grant-<timestamp>.

The GRANT will add a policy of recursive access to the HDFS path underlying the Hive object in the GRANT. The REVOKE will remove a policy of the exactly matched resource and on a corresponding privilege.

 

The third part is to adjust Ranger Hive policies as result of SQL’s GRANT/REVOKE calls. Right now, Ranger Hive Plugin is only enabled for the HiveServer2 so the Hive CLI does not see corresponding Ranger policies being adjusted as result of Hive GRANT/REVOKE calls. Installation change is required to enable the plugin not just on HiveServer2, but for Hive CLI as well. See 2.4.

 

The names of the new policies created from the sync of the Hive metadata objects will be of the form of hive-grant-<timestamp>.

 

The forth part is to adjust corresponding HDFS policies to reflect the privilege changes as result of Ranger Hive policy changes. Corresponding HDFS policies will have the names of hive-grant-<hive policy name>, and will map the resources, resource patterns, privileges and taggings from the Hive policies.

 

Note that only the SQL objects that have direct backing storage could trigger the HDFS policies changes. These objects include tables and do not include views, locks,

plus databases for their not having direct backing stores

 

A “prohibitive” approach will be adopted when privileges are managed at a finer granularity that the finest backing storage ACL unit of files. On one hand, that is, say, if a user is allowed to access only some, but not all, of columns of a Hive table file, then the file is not accessible to the user.  A use case is that a Hive user is only allowed to view the “age” and “address” fields but not allowed to view the “SSN” field of a “customer” table. The “prohibitive” approach will not give him the access to the HDFS files containing backing the “customer” table. If the user has access to all of the columns of the table, he will be allowed to access the backing files on HDFS.

 

On the other hand, Hive privileges will be mapped to HDFS privileges in a “prohibitive” manner. For instance, both of SQL’s CREATE and DROP must be allowed for a backing store’s HDFS “write” to be allowed. Conceivably the full mapping could be complex and could be made ever more comprehensive in a phased approach.

2.2.4  Sequence Diagram of HDFS Policy Sync from Hive Privilege Changes

...

2.3 Ranger Admin

The RangerServiceREST’s grant/revokeAccess methods will handle the policy adjustments as is now, even though the requests could come from both the existing Hive plugin and the new Hive metastore plugin.

 

In addition, the RangerServiceREST’s grant/revokeAccess methods, once determined that there is a non-null value of the service’s configured key of “resourceService”, will locate a HDFS service with the name and adjust policies accordingly therein.

 

A new method of RangerServiceREST, “alterResource”, will be added to handle the resource renaming requests as result of the SQL’s “ALTER … RENAME …” operations. 

 

2.4  The “ServicePolicies” Class

This class will be added a new “Map<String, String> serviceConfigs” field to hold service-specific configurations. For now, if the corresponding serviceDef has a non-null “resourceService” field, a map entry of “resourceService=>true” will be used and, after fetched  by the refresher (see 2.5) of a plugin, will trigger the Hive plugin to send over the table storage information to the Admin.

 

2.5 Refresher

The refresher will be enhanced to fetch the “serviceConfigs” of the “ServicePolicies” objects from the Admin.

 

2.6 Hive Plugins

If the “resourceService” Boolean flag fetched from Admin is true (see 2.4), will send the table storage information to the Admin on DDL commands.

 

2.7 Installation

The Hive configuration needs to enable Hive Metastore Security. Specifically, the hive.metastore.pre.event.listeners and hive.metastore.event.listeners need to be configured to use Ranger implementations.

 

In addition, to support Range Hive policy changes as result of Hive GRANT/REVOKE calls from Hive CLI, the Ranger Hive Plugin is to be enabled in hive-site.xml instead of  hiveserver2-site.xml.

 

Essentially through these configuration settings, both Hive Security and Hive Metastore Security are enabled simultaneous through the Ranger. We don’t support enabling just one of the two as Hive itself could.

 

2.8 Ranger DB Store

The new “resourceService” configuration field of the servicedef will be added to the persistent data store. Backward compatibility should be retained through addition to the x_service_config_map table.

 

2.9 GUI

The “Config Properties” list of the Hive’s “Create Service” page will be added a new entry named “Storage Service” that defaults to empty and will otherwise contain the field that denotes the HDFS service name that will see the synched policies as result of the driving Hive table’s privilege changes. If the HDFS service of the name does not exist already, an error will be returned and the creation of the Hive service will fail. 

 

3 Appendix

3.1  Hook Invocations by Hive

The invocations of the two hooks of MetaStorePreEventListener and HiveAuthorizer  by the Hive are examined among different configurations and runtimes. Results are shown in below tables for future references in case when questions/doubts may rise as to what hooks are or should be invoked. MetaStoreEventListener invocations are not examined here and could be added in the future if necessary to clarify things out in that corner. Similarly the experiments are performed using MYSQL as the backing store for the metastore. No other backing store, embedded stores in particular, have been tested here.

In the tables, “listener” denotes “MetaStorePreEventListener; “Authorizer” denotes “HiveAuthorizer”; “x” means no invocation at all; “*” means “seemingly always being denied before possibly proceed further”.

Conclusions are 1) Hive metastore security needs to be enabled to provide access controls to HIVE CLI;  2) when metastore security is enabled, some checks may be redundantly performed by both of the two hooks, which may represent some inefficiency. When this occurs, metastore checks seem to be performed before the ones by the authorizer, indicating a preference of former over the latter for sake of performance. But the authorizer is capable of finer granular checks like column-level access checks. It remains to be seen how to invoke just one hook over the other depending upon the target to be access controlled. This, however, might require changes on the Hive part. 

 

3.1.1 Hive CLI, HiveAuthorizer specified in hive-site.xml

Metastore SecuritySELECTDDL/DMLGRANT/REVOKE
None(hive.metastore.pre.event.listeners not set)xxAuthorizer
Storage-BasedListenerListenerListener+Authorizer
DefaultListener*Listener*Listener*

 

3.1.2 Hive CLI, HiveAuthorizer specified in hiveserver2-site.xml

Metastore SecuritySELECTDDL/DMLGRANT/REVOKE
None(hive.metastore.pre.event.listeners not set)xxx
Storage-BasedListenerListenerListener
DefaultListener*Listener*Listener*

 

3.1.3 Hive Server2, HiveAuthorizer specified in hiveserver2-site.xml

Metastore SecuritySELECTDDL/DMLGRANT/REVOKE
None(hive.metastore.pre.event.listeners not set)AuthorizerAuthorizerAuthorizer
Storage-BasedListener+AuthorizerListener+AuthorizerListener+Authorizer
DefaultListener*Listener*Listener+Authorizer

 

3.1.4 Hive Server2, HiveAuthorizer specified in hive-site.xml

Metastore SecuritySELECTDDL/DMLGRANT/REVOKE
None(hive.metastore.pre.event.listeners not set)AuthorizerAuthorizerAuthorizer
Storage-BasedListener+AuthorizerListener+AuthorizerListener+Authorizer
DefaultListener*Listener*Listener+Authorizer

 

3.2 Future Extensions

It is conceivable that the same sync mechanism as described in Section 2.2.3 can be similarly applied to other Hadoop applications. In particular, the new “resourceService” field can serve as a link between an application and its underlying storage. It could be even pushed to form a “synch chain” of morn than two levels. For instance, for a Hive on HBase on HDFS.  

1  Configurables

 

2.1.1 New Ranger Configurables

hive.metastore.event.listeners = org.apache.ranger.authorization.hive.authorizer.RangerHiveMetastorePrivilegeHandle enables DDL-triggered Range policy updates.

 

2.1.2 Existing configurables with new, expanded scope

a) hive.security.authorization.manager = org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory enables the Ranger-based authorization for Hive Server2 and Hive CLI.

b) xasecure.hive.update.xapolicies.on.grant.revoke, which controls Hive Ranger policy update from SQL Grant/Revoke commands, will affect Hive CLI grant/revoke operations to Ranger too.

b) policy.grantrevoke.auth.users, which specifies non-admin users who are allowed to do SQL grant/revoke against Ranger, will also control the grant/revoke operations from Hive CLI.

2.2  Automatic policy updates as result of Hive DDL

 

2.2.1  The automatic privilege grants for newly created tables and databases as configured in hive.security.authorization.createtable.user.grants, hive.security.authorization.createtable.group.grants and hive.security.authorization.createtable.role.grants;

 

2.2.2  The DDL command’ ALTER TABLE … RENAME TO … will cause the corresponding table name changes for the “exactly” matched, vs. pattern-matched,  Ranger policies;

 

2.2.3  The “Drop Table …” command will cause the “exactly” matched Ranger policy to be dropped.

 

This new feature will be available for both Hive Server2 and Hive CLI.

Failures of the policy updates will not cause the whole operation to fail: the changes on the Database Object are still successful; it is just that the related Ranger policies will not experience the corresponding changes, and that a warning will be logged to the effect.

2.3 Hive CLI

a)    The privilege checks are the same as now by the current HiveServer2 Ranger Plugin;

b)   The Grant/Revoke operations are subject to the same privilege checks, and are dependent on the same configuration parameter, xasecure.hive.update.xapolicies.on.grant.revoke, as by the current HiveServer2 Ranger Plugin.

 

3. Installation and Uninstallation

The two configuration parameters of  hive.security.authorization.manager and hive.metastore.event.listeners in hive-site.xml enables the Ranger authorization and Ranger policy updates from DDL respectively, for both Hive CLI and HiveServer2; while the same 2 configuration parameters in hiveserver2-site.xml only will enable the two functionalities only for HiveServer2.

The installation script, enable-hive-plugin.sh, adds the two configuration parameters in hive-site.xml, in addition to the existing behavior of adding only the hive.security.authorization.manager configuration parameter to hiveserver2-site.xml, enabling both functionalities for both Hive CLI and HiveServer2.

To enable just the old Ranger Hive Plugin functionalities in HiveServe2, it is required to remove the two configuration parameters from hive-site.xml after installed by the installation script, enable-hive-plugin.sh.

To enable the 2 functionalities just for HiveServer2, after installation by the installation script of enable-hive-plugin.sh, it is required to remove the two configuration parameters from hive-site.xml from hive-site.xml, followed by adding the hive.metastore.event.listeners configuration parameter to hiveserver2-site.xml.

Running the uninstallation script, disable-hive-plugin.sh, will disable all Ranger hooks from Hive CLI and HiveServer2.

4. Limitations

 

4.1 No  Ranger policy updates from the DDL command of “ALTER TABLE … ADD/REPLACE/CHANGE column_name …”

Column name changes will not trigger any adjustments of any Ranger Hive policies. It might be added in a future release.

4.2   Automatic owner grants

At table creation the automatic owner grants, as specified by the Hive configuration of hive.security.authorization.createtable.owner.grants, are folded into the automatic user grants. Therefore they are not assigned to the unbound Ranger table “{OWNER}” field, and will not be subject to any owner change of the table after creation. That is, the owner grants are fixed as given to the owner of the table at creation.

Note that this is the same behavior as from the native Hive authorizer.

5.  Sequence Diagram of Supported ACL Features in Ranger Hive Plugin

Gliffy Diagram
nameSeqDiagram

...