Apache Solr Documentation

5.0 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

5.1 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 5.1

Skip to end of metadata
Go to start of metadata

Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability. Called SolrCloud, these capabilities provide distributed indexing and search capabilities, supporting the following features:

  • Central configuration for the entire cluster
  • Automatic load balancing and fail-over for queries
  • ZooKeeper integration for cluster coordination and configuration.

SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas. Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Documents can be sent to any server and ZooKeeper will figure it out.

In this section, we'll cover everything you need to know about using Solr in SolrCloud mode. We've split up the details into the following topics:

You can also find more information on the Solr wiki page on SolrCloud.

 

Labels
  • No labels
  1. HI Team,

     

    I have a serious problem, while using solr..I have a job which will run daily twice which is actually has 112946697 records. To process these many records solr is taking 5 to 6 hours. 

    is there any way to improve the performance of solr to process/update this much huge data ? .

     

    My search is returning very faster but loading this information into solr is taking more time.

     

    Please let me know if you need any other information like solr settings..etc.

    1. This is a terrible place to ask for tech support.  This should go on the mailing list, or the IRC channel.

      I assume this is an indexing job.  This does not seem like a problem to me.  I have an index that spreads 95 million records across six large shards and one small shard, living on two servers.  All shards run the dataimport handler in parallel, importing from a MySQL database.  A full rebuild takes about 4.5 hours ... and this is building six large indexes at the same time, taking advantage of multiple CPU cores.  The database is definitely not the bottleneck here, because I can create a program that indexes all of the records to one Solr index, and if I take out the "send to Solr" step (it still builds the SolrInputDocument object), it runs through all 95 million records in about 20 minutes.

      If you can process over 100 million records with in 5-6 hours and you have either a single index or a SolrCloud ... you do not have a performance problem, especially if the documents are a few kilobytes or larger.

      If you have additional questions, please ask this again on the mailing list or join the #solr IRC channel.

    2. Followup: You posted this on the SolrCloud page, so I have one more thing to add:  If your index is sharded and you can send your updates directly to the correct shard leader, indexing will be faster.  CloudSolrServer (in the Java client) does this automatically as of the 4.5 or 4.6 version, but if your client isn't Java, you'll have to work out which Solr instances are the correct shard leaders in your own code.