Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apache Ranger uses Apache Solr to store audit logs and provides UI searching through the audit logs. It is required that Solr is installed and configured before installing RangerAdmin or any of the Ranger component plugins. There are options for Solr installation:

  1. Solr - Standalone
    Single A single instance of Solr is easy to setup and has no dependency with on Zookeeper. It is recommended to use this option only for testing Ranger .and in non-production environment
  2. SolrCloud
    This is the preferred setup for Ranger. SolrCloud is a scalable architecture which can run as a single node or multi-node cluster. It has additional features like replication and sharding, which useful for high availability (HA) and scalability. You need to plan your deployment based on your cluster size.
  3. Self Install
    If you have your own Solr installed, then you can use the schema.xml xml and managed-schema provided by Ranger to create the collection in your setup

Useful Links

Configuring Apache Solr for high scale environment could be challenging. Please review the following links if you are expecting a very high volume of Audit Logs. 

Note
titleApache Ambari

Please note, if you are using Apache Ambari, then Ambari maintains it's own solr-config template. So make sure to update the template also, otherwise, your manual changes might be overwritten.

 

 

Prerequisite

  1. JDK 1.7 or above. Apache Solr 5.2 or aboeabove
  2. Solr is both memory and CPU intensive. If your production system has a high volume of access requests, then make sure the server running Solr has adequate memory, CPU, and disk.
  3. Since audit records can grow dramatically, then plan to have at least 1 TB free space in the volume where Solr is going to store the index data
  4. Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
  5. Finally, SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.
  6. If using SolrCloud, then you also need ZooKeeper installed and configured

...

Property NameSample valuesDescription
SOLR_INSTALLtrueIf this is set to true, then the setup.sh will download the Solr package adn and install it.
SOLR_DOWNLOAD_URLhttp://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgzIt is recommended to use one for Apache mirrors to download the Solr package. Please pick of the mirror site from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
SOLR_INSTALL_FOLDER/opt/solrThe location where you want to install Solr.  

Configuration Options

You can configure to Solr to run as standalone or SolrCloud. If you want setup.sh to configure for standalone moremode, then follow this section Standalone Configuration. If you are configuring for SolrCloud, then follow this section SolrCloud Configuration. If you want to configure you your own Solr, then refer to this section Self Configuration

...

Property NameSample valuesDescription
JAVA_HOME Provide the path to where you have installed JDK. If it is Hadoop, then you can check /etc/hadoop/conf/hadoop-env.sh for the value of JAVA_HOME. Please note, Solr only support JDK 1.7 and above.
SOLR_USERsolrThe linux Linux user used to run Solr
SOLR_INSTALL_FOLDER/opt/solrLocation where the Solr is installed. This is the same property used if you want setup.sh to install Solr
SOLR_RANGER_HOME/opt/solr/ranger_audit_serverThis is the location where Ranger related configuration and schema files will be copied
SOLR_RANGER_PORT6083The port you want Solr to listen on.
SOLR_DEPLOYMENTstandaloneThe value standalone will configure solr to run as standalone.
SOLR_RANGER_DATA_FOLDER/opt/solr/ranger_audit_server/dataThis is the folder where you want the index data to be stored. It is important that the volume for this folder has enough disk space. It is recommended to have at least 1 TB free space for index data. Please take regular backup of this folder.
SOLR_LOG_FOLDER/var/log/solr/ranger_auditsThe folder where where want Solr logs to go. Make sure the volume for this folder has enough disk space. Please delete old log files on regular basis.
SOLR_MAX_MEM2gThis is the memory assigned for Solr. Make sure you provide adequate memory to the Solr process

...

After starting Solr for RangerAudit, Solr will listen at ${SOLR_PORT}. E.g Check Solr by accessing  httphttp://${SOLR_HOST}:6083 from your browser.

...

Property NameSample valuesDescription
JAVA_HOME Provide the path to where you have installed JDK. If it is Hadoop, then you can check /etc/hadoop/conf/hadoop-env.sh for the value of JAVA_HOME. Please note, Solr only support JDK 1.7 and above.
SOLR_USERsolrThe linux Linux user used to run Solr process
SOLR_INSTALL_FOLDER/opt/solrLocation where the Solr is installed. This is the same property used if you want setup.sh to install Solr
SOLR_RANGER_HOME/opt/solr/ranger_audit_serverThis is the location where the scripts and index data will be stored. Please note, in SolrCloud, there is no publicly configurable option to provide the location for storing the index data. So make sure you set the value to the folder where the volume as enough disk space.
SOLR_RANGER_PORT6083The port you want Solr to listen on.
SOLR_DEPLOYMENTsolrcloudThe value solrclould will configure solr to run as SolrCloud.
SOLR_ZK${zk_host}:2181/ranger_auditsIt is recommended to give sub-folder to create the Ranger Audit related configurations. In this way, you can use ZooKeeper for other installations of Solr also. You have to give the zookeeper node only for the last node. E.g. zk1:2181,zk2:2182,zk3:2181/ranger_audits
SOLR_SHARDS1If you wish to distribute your audit logs, then you can use multiple shards. Make sure the number of shards is equal or less than the number of Solr nodes you will be running.
SOLR_REPLICATION1It is highly recommend recommended to setup set up at least 2 nodes and replicate the indexes. This gives redundancy to index data and also load balancing of Solr queries. Please note, Solr recommends that you should have SOLR_SHARD * SOLR_REPLICATION Solr instances. E.g. if you have 3 shards and 2 replications, then you have 6 Solr instances.
SOLR_LOG_FOLDER/var/log/solr/ranger_auditsThe folder where where want Solr logs to go. Make sure the volume for this folder has enough disk space. Please delete old log files on regular basis.
SOLR_MAX_MEM2gThis is the memory assigned for Solr. Make sure you provide adequate memory to the Solr process. If you are using very high transaction/request Hadoop environment, then it might better to assign up to 32GB memory for Solr.

...

After starting Solr for RangerAudit, Solr will listen at ${SOLR_PORT}. E.g Check Solr by accessing  httphttp://${SOLR_HOST}:6083 from your browser.

...