You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This document describes two authorization models and explains how to configure and use storage-based authorization.

Default Authorization Model of Hive

The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table. It is descibed in more detail in https://cwiki.apache.org/Hive/languagemanual-auth.html.

This RDBMS style of authorization is not very suitable for the typical use cases in Hadoop because of the following differences in implementation:

  1. Unlike a traditional RDBMS, Hive is not in complete control of all data underneath it. The data is stored in a number of files, and the file system has an independent authorization system.
  2. Also unlike a traditional RDBMS which doesn’t allow other programs to access the data directly, people tend to use other applications that read or write directly into files or directories that get used with Hive.

This creates problem scenarios like:

  1. You grant permissions to a user, but the user can’t access the database or file system because they don’t have file system permissions.
  2. You remove permissions for a user, but the user can still access the data directly through the file system, because they have file system permissions.

Storage-System Based Authorization Model

The Hive community realizes that there might not be a one-size-fits-all authorization model, so it has support for alternative authorization models to be plugged in.

In the HCatalog package, we have introduced implementation of an authorization interface that uses the permissions of the underlying file system (or in general, the storage backend) as the basis of permissions on each database, table or partition.

In Hive, when a file system is used for storage, there is a directory corresponding to a database or a table. With this authorization model, the read/write permissions a user or group has for this directory determine the permissions a user has on the database or table. In the case of other storage systems such as HBase, the authorization of equivalent entities in the system will be done using the system’s authorization mechanism to determine the permissions in Hive.

For example, an alter table operation would check if the user has permissions on the table directory before allowing the operation, even if it might not change anything on the file system.

A user would need write access to the corresponding entity on the storage system to do any type of action that can modify the state of the database or table. The user needs read access to be able to do any non-modifying action on the database or table.

When the database or table is backed by a file system that has a Unix/POSIX-style permissions model (like HDFS), there are read(r) and write(w) permissions you can set for the owner user, group and ‘other’. The file system’s logic for determining if a user has permission on the directory or file will be used by Hive.

Details of HDFS permissions are given here: [Permissions Guide|http://hadoop.apache.org/common/docs/r1.0.2/hdfs_permissions_guide.html Permissions Guide.

The following table shows the minimum permissions required for Hive operations under this authorization model:

  • No labels