You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

1       Introduction

Apache Ranger provides centralized security for Enterprise Hadoop ecosystem, including fine-grained access control and centralized auditing. As of 0.5 version, Apache Ranger policies enable authorization for specific set of resources – like HDFS files/directories, Hive databases/tables/columns, HBase tables/column-families/columns, Kafka topics, etc. Features like centralized policy management for many Hadoop components, ability to specify resource sets using wildcards and delegated administration model, make the security administration of Hadoop resources much simpler to manage.

Ability to authorize access based on tags associated with the resources, instead of the resources themselves, offers many advantages. One of the important advantage is the separation of resource-classification from access-authorization. For example, resources (HDFS file/directory, Hive database/table/column, Kafka topic etc.) containing sensitive data like social-security-number/credit-card-number/sensitive-health-care-data can be tagged with PII/PCI/PHI – either as the resource enters the Hadoop ecosystem or any time later. Once a resource is tagged, the authorization for the tag would automatically be enforced, thus eliminating the need to create or update of policies for the resource. Also, a single authorization policy for a tag can be used to authorize access to resources across various Hadoop components – which eliminates the need to create separate policies in each component.

The goal of this document is to provide an overview of tag-based policies implementation in Apache Ranger.

2       Tag-based policy

Apache Ranger introduces a new service-type called ‘tag’ to work with tag-based policies. The new service-type ‘tag’ is similar to other existing service-types – HDFS, Hive, HBase, Kafka, YARN, Strom, etc. With this approach, the users can use existing/familiar resource-based policy UI for tag-based policies as well. In addition, this also enables reuse of existing infrastructure that deal with Ranger Policies – like REST APIs, persistence, custom conditions, policy engine, etc.

2.1      Ranger Admin UI

Apache Ranger provides a new UI page, named ‘Tag Based Policies’, to work with tag based policies. The workflow to create/update tag-based policies is essentially same as with the existing ‘Resource Based Policies’.

Start by adding a tag service instance, in which tag-based policies can be created. Multiple tag service instances can be created – like tag-dev/tag-test/tag-prod, to group tag-based policies for different clusters.

Policy UI for tag-based policy looks very similar to existing resource-based policies. The name of the tag should be specified at the top half of the page; the bottom half of the page provides the UI to specify permissions for users and groups. Following are few differences from resource-based policies UI:

  • Permissions UI lists the permissions available in all the service-types. This allows policy authors to restrict type of accesses users/groups can perform on tagged resources
  • Wildcards are not allowed in tag names. Also only one tag can be entered per policy
  • Delegated Admin is not available for tag-based policies. Currently only an administrator can work with tag-based policies

2.2      Update component services for tag-based policies

Apache Ranger plugins enforce the authorization policies defined in the component service – like hive-dev/hive-test/hive-prod. For the plugins to also enforce tag-based policies, the component service must be updated to refer to a specific tag service instance (like tag-dev/tag-test/tag-prod). Follow the steps below:

-       go to ‘Resource Based Policies’ page

-       click on the Edit button of the component service that needs to be updated

-       select appropriate tag service name from the list of services shown in ‘Select Tag Service’

3       Tag Store

Details of tags associated with resources are stored in a tag store. Apache Ranger plugins retrieve the tag details from the tag store for use during policy evaluation. To minimize the performance impact during policy evaluation (in finding tags for resources), Apache Ranger plugins cache the tags and periodically poll the tag store for any changes. On detecting change, the plugins update the cache. In addition, the plugins store the tag details in a local cache file – just as the policies are stored in a local cache file. On component restart, the plugins will use the tag data from the local cache file if the tag store is not reachable.

In the current release, Apache Ranger plugins download the tag details from the store managed by Ranger Admin. Ranger Admin persists the tag details in its policy store and provides a REST interface for the plugins to download the tag details.

4       Tag Sync

Apache Ranger introduces a new module, ranger-tagsync, to populate the tag store from the tag details available in an external system like Apache Atlas.  Tag sync is a daemon process similar to ranger-usersync process.

In the current release, ranger-tagsync supports receiving tag details from Apache Atlas via change notifications. As tags are added/updated/deleted to resources in Apache Atlas, ranger-tagsync would receive notifications and update the tag store.

5       Tags

Tags in Apache Ranger can have attributes. Tag attribute values can be used in Ranger tag-based-policies to influence the authorization decision.

For example, to deny access to a resource after a specific date:

  • add EXPIRES_ON tag to the resource
  • add a tag attribute, named expiry_date, with its value set to the expiry date
  • create a Ranger policy for EXPIRES_ON tag
  • add a condition in this policy to deny the access when the date specified in expiry_date tag attribute is later than the current date

In fact, the above detailed EXPIRES_ON tag policy is created as the default policy in tag service instances.

6       Tags in policy evaluation

While authorizing an access request, Apache Ranger plugin evaluates applicable Ranger policies for the resource being accessed. This section provides details of how the tags are found and used during policy evaluation.

6.1      Finding tags

Apache Ranger stack model, introduced in Ranger 0.5, supports a service to register context enrichers, which are used to update context data to the access request.

Tag service, which is introduced in tag-based policies feature, adds a context enricher named RangerTagEnricher. This context enricher is responsible for finding tags for the requested resource and adding the tag details to the request context. This context enricher keeps a cache of the available tags; while processing an access request, it finds the tags applicable for the requested resource and adds the tags to the request context. The context enricher keeps the cache updated by periodically polling Ranger Admin for changes.

6.2      Evaluating tag-based-policies

Once the list of tags for the requested resource are found, Apache Ranger policy engine will evaluate the tag-based-policies applicable for the tags. If a policy for one of these tag results in deny, the access will be denied. If the policies allow all tags, the access will be allowed. If there is no result for any tag or if there are no tags for the resource, the policy engine will evaluate the resource-based policies to make the authorization decision.

6.3      Using tags in conditions

Apache Ranger stack model allows use of custom conditions while evaluating the policies for authorization. Apache Ranger policy engine makes various request details - like user, groups, resource and context, available to the conditions. Tags in the request context, which are added by the enricher, are available to the conditions and can be used to influence the authorization decision.

The default policy in tag service instances, for EXPIRES_ON tag, uses such condition to check if the request date is later than the value specified in tag attribute expiry_date

  • No labels