SolrCloud is designed to provide a highly available, fault tolerant environment for distributing your indexed content and query requests across multiple servers. It's a system in which data is organized into multiple pieces, or shards, that can be hosted on multiple machines, with replicas providing redundancy for both scalability and fault tolerance, and a ZooKeeper server that helps manage the overall structure so that both indexing and search requests can be routed properly.
This section explains SolrCloud and its inner workings in detail, but before you dive in, it's best to have an idea of what it is you're trying to accomplish. This page provides a simple tutorial to start Solr in SolrCloud mode, so you can begin to get a sense for how shards interact with each other during indexing and when serving queries. To that end, we'll use simple examples of configuring SolrCloud on a single machine, which is obviously not a real production environment, which would include several servers or virtual machines. In a real production environment, you'll also use the real machine names instead of "localhost" which we've used here.
In this section you will learn how to start a SolrCloud cluster using startup scripts and a specific configset.
This tutorial assumes that you're already familiar with the basics of using Solr. If you need a refresher, please see the Getting Started section to get a grounding in Solr concepts. If you load documents as part of that exercise, you should start over with a fresh Solr installation for these SolrCloud tutorials.
bin/solr script makes it easy to get started with SolrCloud as it walks you through the process of launching Solr nodes in cloud mode and adding a collection. To get started, simply do:
This starts an interactive session to walk you through the steps of setting up a simple SolrCloud cluster with embedded ZooKeeper. The script starts by asking you how many Solr nodes you want to run in your local cluster, with the default being 2.
The script supports starting up to 4 nodes, but we recommend using the default of 2 when starting out. These nodes will each exist on a single machine, but will use different ports to mimic operation on different servers.
Next, the script will prompt you for the port to bind each of the Solr nodes to, such as:
Choose any available port for each node; the default for the first node is 8983 and 7574 for the second node. The script will start each node in order and show you the command it uses to start the server, such as:
The first node will also start an embedded ZooKeeper server bound to port 9983. The Solr home for the first node is in
example/cloud/node1/solr as indicated by the
After starting up all nodes in the cluster, the script prompts you for the name of the collection to create:
The suggested default is "gettingstarted" but you might want to choose a name more appropriate for your specific search application.
Next, the script prompts you for the number of shards to distribute the collection across. Sharding is covered in more detail later on, so if you're unsure, we suggest using the default of 2 so that you can see how a collection is distributed across multiple nodes in a SolrCloud cluster.
Next, the script will prompt you for the number of replicas to create for each shard. Replication is covered in more detail later in the guide, so if you're unsure, then use the default of 2 so that you can see how replication is handled in SolrCloud.
Lastly, the script will prompt you for the name of a configuration directory for your collection. You can choose basic_configs, data_driven_schema_configs, or sample_techproducts_configs. The configuration directories are pulled from
server/solr/configsets/ so you can review them beforehand if you wish. The data_driven_schema_configs configuration (the default) is useful when you're still designing a schema for your documents and need some flexiblity as you experiment with Solr.
At this point, you should have a new collection created in your local SolrCloud cluster. To verify this, you can run the status command:
If you encounter any errors during this process, check the Solr log files in
You can see how your collection is deployed across the cluster by visiting the cloud panel in the Solr Admin UI: http://localhost:8983/solr/#/~cloud. Solr also provides a way to perform basic diagnostics for a collection using the healthcheck command:
The healthcheck command gathers basic information about each replica in a collection, such as number of docs, current status (active, down, etc), and address (where the replica lives in the cluster).
Documents can now be added to SolrCloud using the Post Tool.
To stop Solr in SolrCloud mode, you would use the
bin/solr script and issue the
stop command, as in:
Starting with -noprompt
You can also get SolrCloud started with all the defaults instead of the interactive session using the following command:
You can restart your SolrCloud nodes using the
bin/solr script. For instance, to restart node1 running on port 8983 (with an embedded ZooKeeper server), you would do:
To restart node2 running on port 7574, you can do:
Notice that you need to specify the ZooKeeper address (-z localhost:9983) when starting node2 so that it can join the cluster with node1.
Adding a node to a cluster
Adding a node to an existing cluster is a bit advanced and involves a little more understanding of Solr. Once you startup a SolrCloud cluster using the startup scripts, you can add a new node to it by:
Notice that the above requires you to create a Solr home directory. You either need to copy
solr.xml to the
solr_home directory, or keep in centrally in ZooKeeper
Example (with directory structure) that adds a node to an example started with "bin/solr -e cloud":
The previous command will start another Solr node on port 8987 with Solr home set to
example/cloud/node3/solr. The new node will write its log files to
Once you're comfortable with how the SolrCloud example works, we recommend using the process described in Taking Solr to Production for setting up SolrCloud nodes in production.