Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

The settings in this section are configured in the <updateHandler> element in solrconfig.xml and may affect the performance of index updates. These settings affect how updates are done internally. <updateHandler> configurations do not affect the higher level configuration of RequestHandlers that process client update requests.

Topics covered in this section:

Commits

Data sent to Solr is not searchable until it has been committed to the index. The reason for this is that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. So, it's preferable to provide control over when data is committed. Several options are available to control the timing of commits.

commit and softCommit

In Solr, a commit is an action which asks Solr to "commit" those changes to the Lucene index files.  By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk). When a client includes a commit=true parameter with an update request, this ensures that all index segments affected by the adds & deletes on an update are written to disk as soon as index updates are completed.

If an additional flag softCommit=true is specified, then Solr performs a 'soft commit', meaning that Solr will commit your changes to the Lucene data structures quickly but not guarantee that the Lucene index files are written to stable storage. This is an implementation of Near Real Time storage, a feature that boosts document visibility, since you don't have to wait for background merges and storage (to ZooKeeper, if using SolrCloud) to finish before moving on to something else. A full commit means that, if a server crashes, Solr will know exactly where your data was stored; a soft commit means that the data is stored, but the location information isn't yet stored. The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.

For more information about Near Real Time operations, see Near Real Time Searching.

autoCommit

These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is to use commitWithin, which can be defined when making the update request to Solr (i.e., when pushing documents), or in an update RequestHandler.

Setting

Description

maxDocs

The number of updates that have occurred since the last commit.

maxTime

The number of milliseconds since the oldest uncommitted update.

openSearcher

Whether to open a new searcher when performing a commit. If this is false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible. The default is true.

If either of these maxDocs or maxTime limits are reached, Solr automatically performs a commit operation. If the autoCommit tag is missing, then only explicit commits will update the index. The decision whether to use auto-commit or not depends on the needs of your application.

Determining the best auto-commit settings is a tradeoff between performance and accuracy. Settings that cause frequent updates will improve the accuracy of searches because new content will be searchable more quickly, but performance may suffer because of the frequent updates. Less frequent updates may improve performance but it will take longer for updates to show up in queries.

You can also specify 'soft' autoCommits in the same way that you can specify 'soft' commits, except that instead of using autoCommit you set the autoSoftCommit tag.

commitWithin

The commitWithin settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:

With this configuration, when you call commitWithin as part of your update message, it will automatically perform a hard commit every time.

Event Listeners

The UpdateHandler section is also where update-related event listeners can be configured. These can be triggered to occur after any commit (event="postCommit") or only after optimize commands (event="postOptimize").

Users can write custom update event listener classes, but a common use case is to run external executables via the RunExecutableListener:

Setting

Description

exe

The name of the executable to run. It should include the path to the file, relative to Solr home.

dir

The directory to use as the working directory. The default is ".".

wait

Forces the calling thread to wait until the executable returns a response. The default is true.

args

Any arguments to pass to the program. The default is none.

env

Any environment variables to set. The default is none.

Transaction Log

As described in the section RealTime Get, a transaction log is required for that feature. It is configured in the updateHandler section of solrconfig.xml.

Realtime Get currently relies on the update log feature, which is enabled by default. It relies on an update log, which is configured in solrconfig.xml, in a section like:

Three additional expert-level configuration settings affect indexing performance and how far a replica can fall behind on updates before it must enter into full recovery - see the section on write side fault tolerance for more information: 

Setting Name

TypeDefault

Description

numRecordsToKeep

int100

The number of update records to keep per log

maxNumLogsToKeep

int10

The maximum number of logs keep

numVersionBuckets

int65536

The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires (8 bytes (long) * numVersionBuckets) of heap space per Solr core.

An example, to be included under <config><updateHandler> in solrconfig.xml, employing the above advanced settings:

  • No labels

7 Comments

  1. "data as finished" should be "data is finished".

    1. Thanks, Bryce. I think it was supposed to be 'has finished', but I changed it to 'is finished'.

  2. In the "commit and softcommit" section there is a typo. 

    In this line - In Solr, a commit is an action which asks Solr to "commit" those changes to the Lucne index files -  Lucene is misspelled.

    1. Thank you very much! I fixed the typo.

  3. Hi! Is the maxPendingDeletes parameter still used in SOLR 4 / 5?

    1. nope. good catch.  removed.

  4. What is the time units for the commitWithin setting? Is it the same when including it as a parameter when making an update request as the setting in a configured update request handler?