Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

This document describes two authorization models and explains how to configure and use storage-based authorization.

Default Authorization Model of Hive

The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table. It is descibed in more detail in https://cwiki.apache.org/Hive/languagemanual-auth.html.

...

  1. You grant permissions to a user, but the user can’t access the database or file system because they don’t have file system permissions.
  2. You remove permissions for a user, but the user can still access the data directly through the file system, because they have file system permissions.

Storage-System Based Authorization Model

The Hive community realizes that there might not be a one-size-fits-all authorization model, so it has support for alternative authorization models to be plugged in.

...

The following table shows the minimum permissions required for Hive operations under this authorization model:============= wip ===========

Operation

Database Read Access

Database Write Access

Table Read Access

Table Write Access

LOAD

 

 

 

X

EXPORT

 

 

X

 

IMPORT

 

 

 

X

CREATE TABLE

 

X

 

 

CREATE TABLE AS SELECT

 

X

X
source table

 

DROP TABLE

 

X

 

 

SELECT

 

 

X

 

ALTER TABLE

 

 

 

X

SHOW TABLES

X

 

 

 

Caution: This authorization model does not prevent malicious users from doing bad things, because of the way authorization is currently implemented in Hive. See the Known Issues section below.

Configuring File-System

...

Based Authorization

The implementation of the file-system-based authorization model is available in the HCatalog package. (Support for this is likely to be added to the Hive package in the future.) So using this implementation requires installing the HCatalog package along with Hive.

...

Code Block
  <property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
    <description>enable or disable the hive client authorization</description>
  </property>

  <property>
    <name>hive.security.authorization.manager</name>
    <value>org.apache.hcatalog.security.HdfsAuthorizationProvider</value>
    <description>the hive client authorization manager class name.
    The user defined authorization class should implement interface 
    org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider.
    </description>
  </property>

To disable authorization, set hive.security.authorization.enabled to false. To use the default authorization model of Hive, don’t set the hive.security.authorization.manager property.

Creating New Tables or Databases

To create new tables or databases with appropriate permissions, you can either use the Hive command line to create the table/database and then modify the permissions using a file system operation, or use the HCatalog command line (hcat) to create the database/table.

The HCatalog command line tool uses the same syntax as Hive, and will create the table or database with a corresponding directory being owned by the user creating it, and a group corresponding to the “-g” argument and permissions specified in the “-p” argument.

Known Issues

  1. Some metadata operations (mostly read operations) do not check for authorization. See https://issues.apache.org/jira/browse/HIVE-3009.
  2. The current implementation of Hive performs the authorization checks in the client. This means that malicious users can circumvent these checks.
  3. A different authorization provider (StorageDelegationAuthorizationProvider) needs to be used for working with HBase tables as well. But that is not well tested.
  4. Partition files and directories added by a Hive query don’t inherit permissions from the table. This means that even if you grant permissions for a group to access a table, new partitions will have read permissions only for the owner, if the default umask for the cluster is configured as such. See https://issues.apache.org/jira/browse/HIVE-3094. A separate "hdfs chmod" command will be necessary to modify the permissions.