Page tree
Skip to end of metadata
Go to start of metadata

Introduction

Audit framework has undergone a major enhancements from Apache Ranger 0.4 to 0.5.  Some of the major changes added in 0.5 were.

  1. Audit to Solr:  This is now preferred and recommended audit store.  Ranger admin now can show audits stored in Solr.

    1. While audit to DB continues to be supported its use has been deprecated.  In future releases its support may be withdrawn.

  2. Audit aggregation: Audit messages logged within a configurable time can be aggregated and logged as a single audit event along with the count.  This can be particularly useful for plugins with a large number of audit events, e.g. kafka, hbase, solr, etc..

Scope of this document

As a result of these changes the audit configuration is 0.5 differs from 0.4.  This document provides those configuration details.  For historical reasons these are also known as v3 style configuration.  The name v3 is a nod to the prior configurations which were named v2 style configurations.

Configuration properties naming convention

Audit configurations properties following the following naming convention: xasecure.audit.destination.<sink-type>.<cfg-name-element1>.<cfg-name-element2> ....

Where:

  1. sink-type denotes the type of audit sink, e.g. hdfs, solr, db, etc.

  2. cfg-name-element1, cfg-name-element2, etc. denote the parts of the configuration item for that specific sink-type, e.g. for sink-type of db a couple of audit configuration properties are: xasecure.audit.destination.db.jdbc.driver and xasecure.audit.destination.db.jdbc.url, etc.

For a concrete example, please refer to the details of one of the audit sinks below.

Audit to Solr

SolrCloud is the preferred audit store.  Audit messages stored in Solr can be viewed via Ranger Admin web app. Solr can be configured to purge audits older than, say, a month or so, with HDFS sink used for long term storage.

  1. All properties for solr listed below start with the following prefix: “xasecure.audit.destination.solr.”.  For example, full name of the first property below would be: xasecure.audit.destination.solr.urls

  2. To enable audit to solr set the property xasecure.audit.destination.solr to true.

  3. Following are the configuration details to configure Ranger audit to Solr.

 

Property name

Details

zookeepers

  1. This denotes a typical zookeeper connect string for that solr instance.  For example: zkhost1.aCompany.com:2181,zkhost1.aCompany.com:2181.

  2. Please note that you can specify multiple host/port for your zookeeper cluster that would used to make solr connection.

  3. If not using zookeeper but direct url instead then leave it empty or set it to NONE.

  4. Either urls or zookeepers must be specified.

  5. If using zookeepers then collection can be used to configure the solr collection used to store ranger audits.

  6. If zookeepers is specified then urls is ignored.

collection

  1. If not using zookeepers but direct url instead then leave it empty or set it to NONE.

  2. If unspecified it defaults to rang ranger_audits.

urls

Example value:

  1. http://solrHost1.aCompany.com:6083/solr/ranger_audits,http://solrHost2.aCompany.com:6083/solr/ranger_audits

  2. Note that you can specify multiple host urls separated by comma.

  3. Either urls or zookeepers must be specified.

  4. If not using direct url specification and using zookeeper configuration instead, then leave this property empty or set it to NONE.

  5. If zookeepers is specified then urls is ignored.

  6. User of this property to configure audit to solr is not recommended in production.

 

Audit to Db

Solr is the preferred and recommended audit store.  Use of database to store Ranger Audits is deprecated.  Users are strongly encouraged to move to Solr to store their audit messages.  The new DB Audit Provider exits only to ease the adoption of Apache Ranger 0.4 users of audit to Ranger 0.5 audit framework.  DB Audit Provider might be removed in future releases.

  1. All properties for db listed below start with the following prefix: “xasecure.audit.destination.db.”. For example, full name of the first property below would be: xasecure.audit.destination.db.jdbc.driver.

  2. To enable audit to solr set the property xasecure.audit.destination.db to true.

  3. Following are the configuration details to configure Ranger audit to db.

 

Property name

Details

jdbc.driver

  1. Example value for MySQL database would be: com.mysql.jdbc.Driver.

  2. Change the value to suit the target database type.

  3. The specified driver should be available in the classpath of the host service.

jdbc.url

  1. Format of the JDBC URL is dictated by the jdbc driver implementation.  For example, for the standard MySQL driver this value could be set to jdbc:mysql://dbhost1.aCompany.com/ranger_audit .

  2. Refer the documentation of your jdbc driver for details.

  3. This would be passed as-is by the persistence framework to the driver class instance.

user

For example, database user for database where ranger audit data is to be stored: rangerlogger

password

Password to be used to connect to the target database.  This property is ignored if a password can be found in the credentials file.

password.alias

  1. The alias under which the password is stored in the credentials file.

  2. If unspecified this property defaults to auditDBCred.

  3. Please refer to the section below about details for specifying the location of credential file.

  4. If credentials file contains the password then password property is ignored.

 

Audit to HDFS

HDFS is the preferred and recommended long term store for Ranger audit messages along with Solr for keeping short term audit messages that might need to be searched.  Audits in Solr would be used to view audits logs using Ranger Admin UI where as audits kept in HDFS can be for compliance or other off-line uses like thread detection, etc..  Solr can be configured to purge audits older than, say, a month or so.

  1. All properties for hdfs listed below start with the following prefix: “xasecure.audit.destination.hdfs.”. For example, full name of the first property below would be: xasecure.audit.destination.hdfs.dir.

  2. To enable audit to hdfs set the property xasecure.audit.destination.hdfs to true.

  3. Following are the configuration details to configure Ranger audit to hdfs.

 

Property name

Details

dir

  1. This is the HDFS Directory where audit logs should be stored.  For example, it can be set to: hdfs://nnhost1.company.com:8020/ranger/audit.

  2. It is an error if this property isn’t specified.

subdir

  1. The subdirectory under dir where audits for this plugin should be kept.

  2. If unspecified its value defaults to: %app-type%/%time:yyyyMMdd%.

  3. %app-type% and %time:yyyyMMdd% are substitution variables available at the time of directory creation.

  4. Please refer to this blog post for a complete list of substitution variables that can be used to specify the filename.format.

  5. The default value is fairly good as it keeps the audits separated by service-type and date.

filename.format

  1. If unspecified its value defaults to: %app-type%_ranger_audit_%hostname%.log.

  2. %app-type% and %hostname% are substitution variables available at the time of file creation.

  3. Please refer to this blog post for a complete list of substitution variables that can be used to specify. the filename.format.

  4. This is the name of the audit file created, if any.

  5. If a file with the specified name already exists then the file’s base name is appended with suffixes like 1, 2, etc. to ensure uniqueness across multiple simultaneous plugins trying to write to HDFS, e.g. for hbase. 

file.rollover.sec

Age of the audit log file in seconds after which it would get rolled over to a new file.  Default is set to 86400, i.e. one day (24 * 60 * 60 = 86400 seconds) .


Audit to Log4j

To enable Ranger to send audit logs to a log4j appender, set property xasecure.audit.destination.log4j to true. Also make sure that property logger is specified as mentioned below.

 

Property name

Details

logger

The name of the logger where the audit logs should be sent to, as specified in the component's log4j configuration file.

Ranger writes audit logs at INFO level. Please ensure that the log4j configuration has INFO level enabled for the logger specified above.

Example

Below are the configuration details to enable Ranger Hive plugin to write audit logs to log4j.

Configure a log4j appender for audit logs in component's log4j configuration file (hive-log4j.properties for Hive):

  log4j.appender.RANGER_AUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RANGER_AUDIT.File=${hive.log.dir}/ranger-hive-audit.log
log4j.appender.RANGER_AUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RANGER_AUDIT.layout.ConversionPattern=%m%n
log4j.logger.ranger.audit=INFO,RANGER_AUDIT

Configure Ranger plugin to write audit logs to log4j (ranger-hive-audit.xml for Hive):

  xasecure.audit.destination.log4j=true
  xasecure.audit.destination.log4j.logger=ranger.audit

Ambari Examples

If you are using Ambari, then you need to update the properties in the corresponding service config sections and restart the services using Ambari.

If you modify the service log4j properties manually (outside Ambari), then when Ambari restarts, it will overwrite it. So, you should always update the properties from Ambari config sections


HiveServer2 Configuration

Append this within the section "Advanced hive-log4j"

Advanced hive-log4j
log4j.appender.RANGER_AUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RANGER_AUDIT.File=${hive.log.dir}/ranger-hive-audit.log
log4j.appender.RANGER_AUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RANGER_AUDIT.layout.ConversionPattern=%m%n
log4j.logger.ranger.audit=INFO,RANGER_AUDIT

 

Add the following properties in "Custom ranger-hive-audit" section.

xasecure.audit.destination.log4j=true
xasecure.audit.destination.log4j.logger=ranger.audit

 

Audit Queues

There is a system of queues that handle audit messages before it gets written to final destination.  These queues provides various feature.  Following diagram gives an overview and subsequent sections provide details of each one of them.



 

Asynchronous logging to in-memory buffer queue

Audit providers logs audit messages to sinks asynchronously so that host service’s operations are not prevented or slowed down by a slow or unavailable audit sink.  Further, in case of an unavailable or slow audit sink it buffers the audit messages in memory to minimize local disk access.  In case of extended outage of an audit sink it spools the unwritten audit messages to disk files so they can be sent to audit sink as and when it becomes available.

Various aspected of those queue providers can be configured via following settings.  All properties below start with the following prefix: xasecure.audit.destination.async.. For example, full name of the first property below would be: xasecure.audit.destination.async.queue.size.

 

Configuration name

Notes

queue.size

  1. This controls the size of in-memory buffer queue.  By default the queue is sized to 1048576, i.e. it store 1M (1024 * 1024) messages.

  2. If this queue is full then audit messages would not be accepted and host service is returned false to indicate this condition. Thus it is important to size it adequately.  Setting file spooling and audit summarization can prevent loss of audit messages due to unavailable or slow destination.

 

Summarization

In high volume systems, like kafka a very large number of audit messages can be generated in a short amount of time.  For compliance and for other practical reasons, like threat detection, it may not be desirable to throttle back the amount or granularity of auditing.

Ranger 0.5 adds the ability to summarize audit messages in such situations while preserving the distinguishing traits of each audit message.  To ensure that no unique/distinguishing information is lost, during summarization, audit messages are aggregated if and only if they differ only in their time stamp.  If anything else about an audit is different then it is preserved as a separate audit message.  Further in interest of capturing as much information as possible the time interval on the aggregate audit message denotes the max and min time of actual audit events that were a part for that summary event.

Following are properties control the behavior of audit summarization.

 

 

Configuration name

Notes

xasecure.audit.provider.summary.enabled

  1. To enable summarization set this property to true.  This would cause audit messages to be summarized before they are sent to various sinks.

  2. By default it is set to false  i.e. audit summarization is disabled.

xasecure.audit.provider.queue.size

  1. If unspecified this value defaults to 1048576, i.e. the queue is sized to store 1M (1024 * 1024) messages.

  2. Note the difference in property name that controls the size of summary queue.

xasecure.audit.provider.summary.interval.ms

  1. The max time interval at which messages would be summarized.

  2. If unspecified it defaults to 5000, i.e. 5 seconds.

Summarization Batch size

  1. Note that regardless of this time interval while summarizing at most 100k messages at a time are considered for aggregation.  Thus, if more than 100k messages are logged during this interval then similar messages could show up as multiple summarized audit messages even though they are logged within the configured time interval.

  2. Currently, this value of 100k is not user configurable.  It is mentioned here for better understanding of Summarization logic.

 

Batching and bulk write of of audit messages

It can be faster to write several messages to solr in a batch rather than write them one at a time.  Similarly when writing audit messages to a database it is much faster to batch write of several messages into a single transaction.  Ranger Audit framework provides this via the use of buffer queues.


Following example assumes that:

  • You are configuring queue provider for solr.

  • You have using standard queue provider, i.e. batch.

This each property configuration name below should be prefixed by: xasecure.audit.destination.solr.batch.  Change the values of audit sink type and queue name to suite your configuration.

 

Configuration name

Notes

batch.size

By default up to 1000  messages are given to these Audit Destination providers at a time to write.  This value can be used to tune that count.

batch.interval.ms

  1. If unspecified, this property defaults to: 3000 , i.e. 3 seconds. This controls the max amount of time for which messages are buffered before they send off to final destination even if the number of messages is less than the configured batch size.

  2. Thus, the actual batch size is controlled by both batch size and batch interval.

 

Configuration related to File spooling

If audit framework detects that an audit destination is down then it buffers the audit messages in memory.  Once memory buffer fills up then it can be configured to spool the unsent messages to disk files to prevent or minimize the loss of audit messages.  Following configuration settings help one to control the behavior around disk spooling of audit messages:

Following example assumes that:

  • You are configuring queue provider for solr.

  • You have using standard queue provider, i.e. batch.

Accordingly each property configuration name is prefixed by: xasecure.audit.destination.solr.batch.filespool.  Change the values of audit sink type and queue name to suite your configuration.

 

 

Configuration name

Default value

Notes

enabled

false

Controls if audit messages would be spooled to local disk files if in-memory buffer queue gets filled up.

dir

N/A

Local disk directory where spool files would be kept.  This value must be specified.

filename.format

spool_%app-type%_%time:yyyyMMdd-HHmm.ss%.log

  1. %app-type% and %time:yyyyMMdd-HHmm.ss% are substitution variables available at the time of spool file creation.

  2. Please refer to this blog post for a complete list of substitution variables that can be used to specify. the filename.format.

  3. This is the name of the spool file created, if any.

archive.dir

archive subdirectory of the spool file dir.

For example, if spool file for solr sink is configured to be /var/log/hadoop/hdfs/audit/solr/spool then by default the spool files would get archived to /var/log/hadoop/hdfs/audit/solr/spool/archive directory.

archive.max.files

100

Max number of files to archive.  If number of files in the archive directory exceed this number then oldest file(s) would get deleted.


file.rollover.sec

86400

Age of the spool file in seconds after which it would get rolled over to a new file.  Default is set to a day (24 * 60 * 60 = 86400 seconds) .

destination.retry.ms

30000

How often should spooler try to reconnect to the destination that was down the last time in milliseconds.  The default is 30s (30 * 1000 = 30000)

drain.threshold.percent

80

Don’t start spooling to disk unless in-memory queue is this much percent full.  As long as audit destination is able to keep up and in-memory queue is adequately sized, a high enough value would ensures that messages are never flushed to local disk.

drain.full.wait.ms

300000

Once a destination comes back up amount of time to let new audit messages get buffered in memory before spooling them.  By default this is set to 5 minutes.  If spool is given enough time to send on-disk messages to the final destination and in-memory queue is properly sized then disk spooling of new messages can be avoided and system can revert back to in-memory buffering with no disk access.

 

Suppressing the Spooling of Audit messages

If you wish to suppress the automatic spooling of audit messages then set the following property settings.  Please note that doing so has consequences since one can lose audit messages.

 

Configuration name

Notes

xasecure.audit.destination.<sink-type>.queue

  1. To suppress spooling to disk file for a particular sink type set its queue name to NONE.  For example, if you do not wish to spool to local disk audit messages written to HDFS then set the property xasecure.audit.destination.hdfs.queue to NONE.

  2. Note that, Suppressing spooling increases the changes that audit messages would be silently dropped if the destination is down or slow.

 

Common configuration Properties

Below are a few properties common to audit framework as a whole and/or they apply to all audit providers.

 

Configuration name

Default value

Notes

xasecure.audit.log.failure.report.min.interval.ms

60000

In event of a failure to send audit events to an audit sink, say, due to a connectivity issue, this is the interval at which WARN messages would be logged to log4j.

xasecure.audit.credential.provider.file

N/A

  • If a password used to connect to an audit provider is encrypted then this property allows one to indicate the location of credentials file.

  • One file would contain credentials for all of the providers.

  • Currently only DB Audit Provider needs and uses it, if required.

 

Using Custom Audit Providers and Queue Providers

Audit frameworks allows a user to plugin their custom implementations of not only the Audit Destination Providers (e.g. custom Solr or HDFS Provider) but also provide custom implementations of Queue providers used by the Framework for buffering audit messages on their way to the final Audit sink.

Standard Audit providers and Queue Providers are quite Robust and function rick.  You can ignore this section if you don’t have a need to use their custom implementations.

 

Configuration name

Notes

xasecure.audit.destination.<sink-type>

If you wanted to use a new audit sink, say, JMS to store audit messages then you could define a new property to signal that by setting xasecure.audit.destination.jms to true.

xasecure.audit.destination.<sink-type>.classname

Since there isn’t a standard Audit Provider for JMS one needs to let the framework know about the class which implements it.  Set the property xasecure.audit.destination.jms.classname to the fully qualified class name of the implementation, e.g. com.company.JmsAuditDestination.

xasecure.audit.destination.<sink-type>.<props>

  • During Initialization the Custom Audit provider would have access to all properties from ranger audit configuration.  This can used to dynamically configure your custom audit provider.

  • It is recommended that such property be prefixed to indicate the provider type.

  • For example, if you custom JMS provider needs property prop1 then and prop2 then name them in ranger config file as xasecure.audit.destination.jms.prop1 and xasecure.audit.destination.jms.prop2 respectively.

xasecure.audit.destination.<sink-type>.queue

Let’s say you If you also also want to use a custom Queue Provider then use this property to identify that Queue provider type.  To use the default queue provider either leave this property unspecified or set it to batch.

xasecure.audit.destination.<sink-type>.<queue-name>.classname

This property is provides the full name of the class which implements the custom Queue provider.  For example, to use a Queue provider that uses a ring buffer with your JMS Audit Provider:

  • set xasecure.audit.destination.jms.queue to ringbuffer and

  • set xasecure.audit.destination.jms.queue.ringbuffer.classname to the full name of the Audit provider implementation class, e.g. com.company.RingBufferQueueProvider .

xasecure.audit.destination.<sink-type>.<queue-name>.prop1

  • During Initialization the Custom Queue provider would have access to all properties from ranger audit configuration.  This can used to dynamically configure your custom queue provider.

  • It is recommended that such property be prefixed to indicate the queue provider type.

  • For example, if you custom JMS provider needs property prop1 then and prop2 then name them in ranger config file as xasecure.audit.destination.jms.queue.ringbuffer.prop1 and xasecure.audit.destination.jms.queue.ringbuffer.prop2 respectively.

 

Passing Custom config properties to standard Audit Providers

For any audit sink framework would also load any custom property named as follows: xasecure.audit.destination.<sink-type>.config.<custom1.elem1>.<custom.elem2>....

Where:

  1. sink-type denotes the type of audit sink, e.g. hdfs, solr, db, etc.

  2. config: is the configuration name element which signals to the framework that following properties should be made available to the audit provider as a custom property for its use.

  3. cfg-name-element1, cfg-name-element2, etc. denote the parts of the configuration item for that specific sink-type, e.g. for sink-type of db a couple of audit configuration properties are: xasecure.audit.destination.db.jdbc.driver and xasecure.audit.destination.db.jdbc.url, etc.

Use of standard HDFS Audit provider to Audit to Azure Blob Storage is an example of how this provision for custom properties is used by standard audit providers to extend their functionality merely via configuration.

Backward compatibility

A brief note about backward compatibility.  Old v2 style configuration(s) and are still supported, of course, and will work as is.  Old configurations trigger the use of older implementations of Audit Providers.  Please refer to this Blog posting for a refresher on those details.

In addition following should be noted about continued use of v2 style configurations.

  1. Future enhancements to audit framework would be made only to the v3 (new Ranger 0.5 Audit) Providers.  Hence, users are encouraged to move to new v3 style configurations.

  2. Further, it is not possible to mix v2 and v3 style configurations.  Presence of any v3 style configuration would suppress any v2 style Audit Providers.

  • No labels