The settings in this section are configured in the
<updateHandler> element in
solrconfig.xml and may affect the performance of index updates. These settings affect how updates are done internally.
<updateHandler> configurations do not affect the higher level configuration of RequestHandlers that process client update requests.
Data sent to Solr is not searchable until it has been committed to the index. The reason for this is that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. So, it's preferable to provide control over when data is committed. Several options are available to control the timing of commits.
In Solr, a
commit is an action which asks Solr to "commit" those changes to the Lucene index files. By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk). When a client includes a
commit=true parameter with an update request, this ensures that all index segments affected by the adds & deletes on an update are written to disk as soon as index updates are completed.
If an additional flag
softCommit=true is specified, then Solr performs a 'soft commit', meaning that Solr will commit your changes to the Lucene data structures quickly but not guarantee that the Lucene index files are written to stable storage. This is an implementation of Near Real Time storage, a feature that boosts document visibility, since you don't have to wait for background merges and storage (to ZooKeeper, if using SolrCloud) to finish before moving on to something else. A full commit means that, if a server crashes, Solr will know exactly where your data was stored; a soft commit means that the data is stored, but the location information isn't yet stored. The tradeoff is that a soft commit gives you faster visibility because it's not waiting for background merges to finish.
For more information about Near Real Time operations, see Near Real Time Searching.
These settings control how often pending updates will be automatically pushed to the index. An alternative to
autoCommit is to use
commitWithin, which can be defined when making the update request to Solr (i.e., when pushing documents), or in an update RequestHandler.
The number of updates that have occurred since the last commit.
The number of milliseconds since the oldest uncommitted update.
Whether to open a new searcher when performing a commit. If this is false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible. The default is true.
If either of these
maxTime limits are reached, Solr automatically performs a commit operation. If the
autoCommit tag is missing, then only explicit commits will update the index. The decision whether to use auto-commit or not depends on the needs of your application.
Determining the best auto-commit settings is a tradeoff between performance and accuracy. Settings that cause frequent updates will improve the accuracy of searches because new content will be searchable more quickly, but performance may suffer because of the frequent updates. Less frequent updates may improve performance but it will take longer for updates to show up in queries.
You can also specify 'soft' autoCommits in the same way that you can specify 'soft' commits, except that instead of using
autoCommit you set the
commitWithin settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:
With this configuration, when you call
commitWithin as part of your update message, it will automatically perform a hard commit every time.
The UpdateHandler section is also where update-related event listeners can be configured. These can be triggered to occur after any commit (
event="postCommit") or only after optimize commands (
Users can write custom update event listener classes, but a common use case is to run external executables via the
The name of the executable to run. It should include the path to the file, relative to Solr home.
The directory to use as the working directory. The default is ".".
Forces the calling thread to wait until the executable returns a response. The default is true.
Any arguments to pass to the program. The default is none.
Any environment variables to set. The default is none.
As described in the section RealTime Get, a transaction log is required for that feature. It is configured in the
updateHandler section of
Realtime Get currently relies on the update log feature, which is enabled by default. It relies on an update log, which is configured in
solrconfig.xml, in a section like:
Three additional expert-level configuration settings affect indexing performance and how far a replica can fall behind on updates before it must enter into full recovery - see the section on write side fault tolerance for more information:
The number of update records to keep per log
The maximum number of logs keep
The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires (8 bytes (long) * numVersionBuckets) of heap space per Solr core.
An example, to be included under
solrconfig.xml, employing the above advanced settings: