Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

This section provides guidance on how to setup Solr to run in production on *nix platforms, such as Ubuntu. Specifically, we’ll walk through the process of setting up to run a single Solr instance on a Linux host and then provide tips on how to support multiple Solr nodes running on the same host.

Service Installation Script

Solr includes a service installation script (bin/install_solr_service.sh) to help you install Solr as a service on Linux. Currently, the script only supports CentOS, Debian, Red Hat, SUSE and Ubuntu Linux distributions. Before running the script, you need to determine a few parameters about your setup. Specifically, you need to decide where to install Solr and which system user should be the owner of the Solr files and process.

Planning your directory structure

We recommend separating your live Solr files, such as logs and index files, from the files included in the Solr distribution bundle, as that makes it easier to upgrade Solr and is considered a good practice to follow as a system administrator.

Solr Installation Directory

By default, the service installation script will extract the distribution archive into /opt. You can change this location using the -i option when running the installation script. The script will also create a symbolic link to the versioned directory of Solr. For instance, if you run the installation script for Solr X.0.0, then the following directory structure will be used:

#666666nonesolid /opt/solr-X.0.0]]>

Using a symbolic link insulates any scripts from being dependent on the specific Solr version. If, down the road, you need to upgrade to a later version of Solr, you can just update the symbolic link to point to the upgraded version of Solr. We’ll use /opt/solr to refer to the Solr installation directory in the remaining sections of this page.

Separate Directory for Writable Files

You should also separate writable Solr files into a different directory; by default, the installation script uses /var/solr, but you can override this location using the -d option. With this approach, the files in /opt/solr will remain untouched and all files that change while Solr is running will live under /var/solr.

Create the Solr user

Running Solr as root is not recommended for security reasons, and the control script start command will refuse to do so. Consequently, you should determine the username of a system user that will own all of the Solr files and the running Solr process. By default, the installation script will create the solr user, but you can override this setting using the -u option. If your organization has specific requirements for creating new user accounts, then you should create the user before running the script. The installation script will make the Solr user the owner of the /opt/solr and /var/solr directories.

You are now ready to run the installation script.

Run the Solr Installation Script

To run the script, you'll need to download the latest Solr distribution archive and then do the following (NOTE: replace solr-X.Y.Z with the actual version number):

#666666nonesolid

The previous command extracts the install_solr_service.sh script from the archive into the current directory. If installing on Red Hat, please make sure lsof is installed before running the Solr installation script (sudo yum install lsof). The installation script must be run as root:

#666666nonesolid

By default, the script extracts the distribution archive into /opt, configures Solr to write files into /var/solr, and runs Solr as the solr user. Consequently, the following command produces the same result as the previous command:

#666666nonesolid

You can customize the service name, installation directories, port, and owner using options passed to the installation script. To see available options, simply do:

#666666nonesolid

Once the script completes, Solr will be installed as a service and running in the background on your server (on port 8983). To verify, you can do:

#666666nonesolid

If you do not want to start the service immediately, pass the -n option. You can then start the service manually later, e.g., after completing the configuration setup.

We'll cover some additional configuration settings you can make to fine-tune your Solr setup in a moment. Before moving on, let's take a closer look at the steps performed by the installation script. This gives you a better overview and will help you understand important details about your Solr installation when reading other pages in this guide; such as when a page refers to Solr home, you'll know exactly where that is on your system.

Solr Home Directory

The Solr home directory (not to be confused with the Solr installation directory) is where Solr manages core directories with index files. By default, the installation script uses /var/solr/data. If the -d option is used on the install script, then this will change to the data subdirectory in the location given to the -d option.  Take a moment to inspect the contents of the Solr home directory on your system. If you do not store solr.xml in ZooKeeper, the home directory must contain a solr.xml file. When Solr starts up, the Solr Control Script passes the location of the home directory using the -Dsolr.solr.home system property.

Environment overrides include file

The service installation script creates an environment specific include file that overrides defaults used by the bin/solr script. The main advantage of using an include file is that it provides a single location where all of your environment-specific overrides are defined. Take a moment to inspect the contents of the /etc/default/solr.in.sh file, which is the default path setup by the installation script. If you used the -s option on the install script to change the name of the service, then the first part of the filename will be different.  For a service named solr-demo, the file will be named /etc/default/solr-demo.in.sh.  There are many settings that you can override using this file. However, at a minimum, this script needs to define the SOLR_PID_DIR and SOLR_HOME variables, such as:

#666666nonesolid

The SOLR_PID_DIR variable sets the directory where the control script will write out a file containing the Solr server’s process ID. 

Log settings

Solr uses Apache Log4J for logging. The installation script copies /opt/solr/server/resources/log4j.properties to /var/solr/log4j.properties. Take a moment to verify that the Solr include file is configured to send logs to the correct location by checking the following settings in /etc/default/solr.in.sh:

#666666nonesolid

For more information about Log4J configuration, please see: Configuring Logging

init.d script

When running a service like Solr on Linux, it’s common to setup an init.d script so that system administrators can control Solr using the service tool, such as: service solr start. The installation script creates a very basic init.d script to help you get started. Take a moment to inspect the /etc/init.d/solr file, which is the default script name setup by the installation script. If you used the -s option on the install script to change the name of the service, then the filename will be different. Notice that the following variables are setup for your environment based on the parameters passed to the installation script:

#666666nonesolid

The SOLR_INSTALL_DIR and SOLR_ENV variables should be self-explanatory. The RUNAS variable sets the owner of the Solr process, such as solr; if you don’t set this value, the script will run Solr as root, which is not recommended for production.  You can use the  /etc/init.d/solr script to start Solr by doing the following as root:

#666666nonesolid

The /etc/init.d/solr script also supports the stop, restart, and status commands. Please keep in mind that the init script that ships with Solr is very basic and is intended to show you how to setup Solr as a service. However, it’s also common to use more advanced tools like supervisord or upstart to control Solr as a service on Linux. While showing how to integrate Solr with tools like supervisord is beyond the scope of this guide, the init.d/solr script should provide enough guidance to help you get started. Also, the installation script sets the Solr service to start automatically when the host machine initializes. 

Progress Check

In the next section, we cover some additional environment settings to help you fine-tune your production setup. However, before we move on, let's review what we've achieved thus far. Specifically, you should be able to control Solr using /etc/init.d/solr. Please verify the following commands work with your setup:

#666666nonesolid

The status command should give some basic information about the running Solr node that looks similar to:

#666666nonesolid

If the status command is not successful, look for error messages in /var/solr/logs/solr.log.

Fine tune your production setup

Memory and GC Settings

By default, the bin/solr script sets the maximum Java heap size to 512M (-Xmx512m), which is fine for getting started with Solr. For production, you’ll want to increase the maximum heap size based on the memory requirements of your search application; values between 10 and 20 gigabytes are not uncommon for production servers. When you need to change the memory settings for your Solr server, use the SOLR_JAVA_MEM variable in the include file, such as:

#666666nonesolid

Also, the Solr Control Script comes with a set of pre-configured Java Garbage Collection settings that have shown to work well with Solr for a number of different workloads. However, these settings may not work well for your specific use of Solr. Consequently, you may need to change the GC settings, which should also be done with the GC_TUNE variable in the /etc/default/solr.in.sh include file. For more information about tuning your memory and garbage collection settings, see: JVM Settings.

Out-of-Memory Shutdown Hook

The bin/solr script registers the bin/oom_solr.sh script to be called by the JVM if an OutOfMemoryError occurs. The oom_solr.sh script will issue a kill -9 to the Solr process that experiences the OutOfMemoryError. This behavior is recommended when running in SolrCloud mode so that ZooKeeper is immediately notified that a node has experienced a non-recoverable error. Take a moment to inspect the contents of the /opt/solr/bin/oom_solr.sh script so that you are familiar with the actions the script will perform if it is invoked by the JVM.

SolrCloud

To run Solr in SolrCloud mode, you need to set the ZK_HOST variable in the include file to point to your ZooKeeper ensemble. Running the embedded ZooKeeper is not supported in production environments. For instance, if you have a ZooKeeper ensemble hosted on the following three hosts on the default client port 2181 (zk1, zk2, and zk3), then you would set:

#666666nonesolid

When the ZK_HOST variable is set, Solr will launch in "cloud" mode.

ZooKeeper chroot

If you're using a ZooKeeper instance that is shared by other systems, it's recommended to isolate the SolrCloud znode tree using ZooKeeper's chroot support. For instance, to ensure all znodes created by SolrCloud are stored under /solr, you can put /solr on the end of your ZK_HOST connection string, such as:

#666666nonesolid

Before using a chroot for the first time, you need to create the root path (znode) in ZooKeeper by using the Solr Control Script. We can use the mkroot command for that:

#666666nonesolid:]]>

If you also want to bootstrap ZooKeeper with existing solr_home, you can instead use use zkcli.sh / zkcli.bat's bootstrap command, which will also create the chroot path if it does not exist. See Command Line Utilities for more info.

Solr Hostname

Use the SOLR_HOST variable in the include file to set the hostname of the Solr server.

#666666nonesolid

Setting the hostname of the Solr server is recommended, especially when running in SolrCloud mode, as this determines the address of the node when it registers with ZooKeeper.

Override settings in solrconfig.xml

Solr allows configuration properties to be overridden using Java system properties passed at startup using the -Dproperty=value syntax. For instance, in solrconfig.xml, the default auto soft commit settings are set to:

#666666nonesolid ${solr.autoSoftCommit.maxTime:-1} ]]>

In general, whenever you see a property in a Solr configuration file that uses the ${solr.PROPERTY:DEFAULT_VALUE} syntax, then you know it can be overridden using a Java system property. For instance, to set the maxTime for soft-commits to be 10 seconds, then you can start Solr with -Dsolr.autoSoftCommit.maxTime=10000, such as:

#666666nonesolid

The bin/solr script simply passes options starting with -D on to the JVM during startup. For running in production, we recommend setting these properties in the SOLR_OPTS variable defined in the include file. Keeping with our soft-commit example, in /etc/default/solr.in.sh, you would do:

#666666nonesolid


Running multiple Solr nodes per host

The bin/solr script is capable of running multiple instances on one machine, but for a typical installation, this is not a recommended setup.  Extra CPU and memory resources are required for each additional instance.  A single instance is easily capable of handling multiple indexes.

When to ignore the recommendation

For every recommendation, there are exceptions.  For the recommendation above, that exception is mostly applicable when discussing extreme scalability.  The best reason for running multiple Solr nodes on one host is decreasing the need for extremely large heaps.

When the Java heap gets very large, it can result in extremely long garbage collection pauses, even with the GC tuning that the startup script provides by default.  The exact point at which the heap is considered "very large" will vary depending on how Solr is used.  This means that there is no hard number that can be given as a threshold, but if your heap is reaching the neighborhood of 16 to 32 gigabytes, it might be time to consider splitting nodes.  Ideally this would mean more machines, but budget constraints might make that impossible.

There is another issue once the heap reaches 32GB.  Below 32GB, Java is able to use compressed pointers, but above that point, larger pointers are required, which uses more memory and slows down the JVM.

Because of the potential garbage collection issues and the particular issues that happen at 32GB, if a single instance would require a 64GB heap, performance is likely to improve greatly if the machine is set up with two nodes that each have a 31GB heap.

If your use case requires multiple instances, at a minimum you will need unique Solr home directories for each node you want to run; ideally, each home should be on a different physical disk so that multiple Solr nodes don’t have to compete with each other when accessing files on disk. Having different Solr home directories implies that you’ll need a different include file for each node. Moreover, if using the /etc/init.d/solr script to control Solr as a service, then you’ll need a separate script for each node. The easiest approach is to use the service installation script to add multiple services on the same host, such as:

#666666nonesolid

The command shown above will add a service named solr2 running on port 8984 using /var/solr2 for writable (aka "live") files; the second server will still be owned and run by the solr user and will use the Solr distribution files in /opt. After installing the solr2 service, verify it works correctly by doing:

#666666nonesolid

  • No labels

21 Comments

  1. Although it's never explicitly stated, there is an implication here that the location in the SOLR_LOGS_DIR environment variable will be the location of ALL logs.  The logs from Solr itself are handled by the log4j.properties file.  If we were using log4j2, then we would have the option of using the environment variable in the log4j config, but the information I've located says that log4j 1.2 can't do it.

    1. In my cluster, I added the line

      to <SOLR_INSTALL_ROOT>/bin/solr

      and replaced the line

      with

      in the log4j.properties file.

      This lets me control where the logs go with SOLR_LOGS_DIR.

  2. Hi,

    Thanks for writing such an in-depth guide to getting Solr setup. It's proved really useful for us!

    One thing we are really having trouble with though, is getting our PHP application one one server, to be able to connect to our Solr server which we have placed on a separate EC2 instance. We have performed the relevant network configuration to allow the two servers to communicate with each other but the PHP application (using the popular 'Solarium' library) still fails to connect (returns a 504 - Gateway Timeout error).

    After trawling through the web, the only thing I am left to believe is that Jetty may need to be configured to allow connections from our PHP application server? 

    Unfortunately i'm coming up short finding the necessary configuration setting/file where I can adjust this?

    Any help would be greatly appreciated (smile)

    Thanks,

    1. The Jetty server included in the Solr download has no restrictions on who or what can access it.  Unless you have changed the configuration on Jetty, it won't be the Solr instance that is preventing communication.

      A 504 HTTP response sounds like your PHP app is hitting a proxy, not the Jetty included with Solr.  I know that Amazon provides a load balancer, and virually every load balancer is implemented as a reverse proxy.

      Most Linux distributions come with a firewall enabled, and Amazon may have additional security sitting in front of that.  It is up to you to reconfigure any firewall that might be running on your system..

  3. Hi Shawn,

    Thanks for the response.

    I'm going to sit down and go through our AWS setup tomorrow to see if we need to do change anything on our load balancer. Could be something we've missed in the security group settings.

    Thanks again,
    Rich

  4. The script creates the new (solr) user with /bin/bash as its login shell. Is this required for proper functioning, or can/should it be replaced with /bin/false or /usr/sbin/nologin ?

    (I should have mentioned I'm using the Ubuntu 14.04 LTS server.)

    (I will probably try it and see, but I figured someone might already know)

    Thanks for a very helpful page and script; really ices the solr cake (smile)

    Edit:

    So far, I have tried the following without noticing errors...

    sudo usermod -s /usr/sbin/nologin solr
    sudo -u solr bin/solr create_core -c <name>

    Edit 2:

    Well, unfortunately, the solr service startup script current does not work. I see this in my var/log/boot.log: This account is currently not available. This message is also what I get when I try to sudo su solr.

    When I revert solr to use the /bin/bash shell and reboot, I do see the usual Solr startup messages. (They're somewhat awkwardly interspersed among the usual " * Starting xxxx ... [ OK ]" messages, making it difficult to read the log, but that's not very important. I guess those processes are detached to let other startup happen.)

    $ sudo cat /etc/shadow | grep solr
    solr:*:16602:0:99999:7:::

     

    Question: Am I correct that because the solr user is created with no password (* in /etc/shadow instead of a hash), this is not something I should worry about from a security / attack surface point of view?

    Thanks,
    Dave

     

  5. Regarding ZooKeeper chroot - why do we recommend using the -cmd bootstrap command to create the chroot directory? Would not -cmd makepath /solr do the trick?

    1. Edited the page to recommend makepath instead, mentioning the more complex bootstrap option in an info box with a link to the tool docs.

  6. The include file is no longer under /var/solr/solr.in.sh file. It is now under /etc/default/solr.in.sh

  7. I have a problem with integrating solr in Ubuntu server.Before using solr on ubuntu server i tested it on my mac it was working perfectly. it indexed my PDF,Doc,Docx documents. so after installing solr on ubuntu server and using the same configuration files and librairies. i've found out that solr doesn't index PDf documents.But i can search over .Doc and .Docx documents.
    here some parts of my solrconfig.xml contents : 

    <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
      <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
    
    <requestHandler name="/update/extract"
                      startup="lazy"
                      class="solr.extraction.ExtractingRequestHandler" >
        <lst name="defaults">
          <str name="lowernames">true</str>
          <str name="fmap.meta">ignored_</str>
          <str name="fmap.content">_text_</str>
        </lst>
      </requestHandler>
  8. I realize this guide is for SOLR 6, which isn't out yet.  However, I see this line in the instructions above: 

    Take a moment to inspect the contents of the/var/solr/solr.in.sh file, which is the default path setup by the installation script.

    I do not find that file there after running the script.  Instead I find it in /opt/solr-5.4.0/bin/solr.in.sh.   Is that by intent?  I cannot find a Taking Solr to Production document for SOLR 5.4 or 5.5 and so I wonder if anyone can tell me about the significance (if any) of this difference?

    1. In addition - it seems to me that this file controls nothing.  The same file in /etc/default is the one that matters for getting all the configs correctly on system start, yes?  The instructions above could probably be improved slightly to clarify this...  Unless it's not true in 6.0

      1. You are correct.  This file will be /etc/default/solr.in.sh with no options given to the install script, and will change the filename if you use the -s option to change the service name.  I have fixed the problem.  I thought this change was in the 5.5 version, but maybe it was also in 5.4.

        1. Sweet!  Thanks.  I ran into some confusion because issuing a command like this: 

          sudo /opt/solr/bin/solr restart -c -z 192.168.56.5,192.168.56.6,192.168.56.7/solr5_4

          Appears to pick up the "other" solr.in.sh file – I assume because it's in the same directory.  This caused a bit of confusion about why we could only get started in "Cloud Mode" by issuing that command - never on system boot.  All cleared up now - thanks.

  9. When will the Windows Service install scripts be available?

    Where is the beta version so that we can get to production?

    Thank you!

    1. There is currently an open issue for creating scripts to run as a Windows Service, but no one has gotten far enough along to add a patch:  SOLR-7105 - Running Solr as a windows service Open , so there is no current ETA for when they will be available in a release.

  10. This tutorial is not cover version 5.1. which mean when I create solrCloud with version 6.2.1, it okay. But version 5.1 is not.

    1. The online version of the Solr Reference Guide only covers the most current release (actually the next release), and does not attempt to provide detailed instructions for every single available release on every page. The version that this documentation applies to is shown at the top of every page.

      The "Older Versions of this Guide (PDF)" link at the upper left will point you to the version of the Guide for 5.1. I'll let you know, though, that some of the techniques covered in this page today were not available in Solr 5.1, so unfortunately you will not find the same tutorial there.

      If you have questions about production strategies for Solr 5.1, your best bet is to consult the mailing list for advice. Information on how to join is available from the Solr website: http://lucene.apache.org/solr/resources.html#community.

  11. As of solr 5.5.2, the command:

    $ bin/solr zk mkroot /solr -z <ZK_node>:<ZK_PORT>

    is giving out the following error:

    ERROR: Unrecognized or misplaced argument: mkroot!

    Should follow the instruction as describe in "Create a new ZooKeeper path" on Solr Control Script Reference

    1. Well, yes the CWiki is for the current version of solr. The mkroot comand was added in 6.4. Please download the full PDF version for 5.5 from the "Older Versions of this Guide" in the upper left.