Apache Solr Documentation

6.2 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

6.3 Draft Ref Guide Topics


This Unreleased Guide Will Cover Apache Solr 6.3

Skip to end of metadata
Go to start of metadata

Basic Pagination

In most search application usage, the "top" matching results (sorted by score, or some other criteria) are then displayed to some human user.  In many applications the UI for these sorted results are displayed to the user in "pages" containing a fixed number of matching results, and users don't typically look at results past the first few pages worth of results.

In Solr, this basic paginated searching is supported using the start and  rows parameters, and performance of this common behaviour can be tuned by utilizing the  queryResultCache and adjusting the queryResultWindowSize configuration options based on your expected page sizes.

Basic Pagination Examples

The easiest way to think about simple pagination, is to simply multiply the page number you want (treating the "first" page number as "0") by the number of rows per page; such as in the following psuedo-code:

How Basic Pagination is Affected by Index Updates

The start param specified in a request to Solr indicates an absolute "offset" in the complete sorted list of matches that the client wants Solr to use as the beginning of the current "page".  If an index modification (such as adding or removing documents) which affects the sequence of ordered documents matching a query occurs in between two requests from a client for subsequent pages of results, then it is possible that these modifications can result in the same document being returned on multiple pages, or documents being "skipped" as the result set shrinks or grows.

For example: consider an index containing 26 documents like so:




Followed by the following requests & index modifications interleaved:

  • A client requests q=*:*&rows=5&start=0&sort=name asc
    • documents with the ids 1-5 will be returned to the client
  • Document id 3 is deleted
  • The client requests "page #2" using q=*:*&rows=5&start=5&sort=name asc
    • Documents 7-11 will be returned
    • Document 6 has been skipped, since it is now the 5th document in the sorted set of all matching results – it would be returned on a new request for "page #1"
  • 3 new documents are now added with the ids 90, 91, and 92; All three documents have a name of A
  • The client requests "page #3" using q=*:*&rows=5&start=10&sort=name asc
    • Documents 9-13 will be returned
    • Documents 9, 10, and 11 have now been returned on both page #2 and page #3 since they moved farther back in the list of sorted results

In typical situations these impacts from index changes on paginated searching don't significantly affect user experience -- either because they happen extremely infrequently in fairly static collections, or because the users recognize that the collection of data is constantly evolving and expect to see documents shift up and down in the result sets.

Performance Problems with "Deep Paging"

In some situations, the results of a Solr search are not destined for a simple paginated user interface.  When you wish to fetch a very large number of sorted results from Solr to feed into an external system, using very large values for the start or rows parameters can be very inefficient.  Pagination using start and rows not only require Solr to compute (and sort) in memory all of the matching documents that should be fetched for the current page, but also all of the documents that would have appeared on previous pages.  So while a request for start=0&rows=1000000 may be obviously inefficient because it requires Solr to maintain & sort in memory a set of 1 million documents, likewise a request for start=999000&rows=1000 is equally inefficient for the same reasons.  Solr can't compute which matching document is the 999001st result in sorted order, without first determining what the first 999000 matching sorted results are.  If the index is distributed, which is common when running in SolrCloud mode, then 1 million documents are retrieved from each shard.  For a ten shard index, ten million entries must be retrieved and sorted to figure out the 1000 documents that match those query parameters.

Fetching A Large Number of Sorted Results: Cursors

As an alternative to increasing the "start" parameter to request subsequent pages of sorted results, Solr supports using a "Cursor" to scan through results.  Cursors in Solr are a logical concept, that doesn't involve caching any state information on the server.  Instead the sort values of the last document returned to the client are used to compute a "mark" representing a logical point in the ordered space of sort values. That "mark" can be specified in the parameters of subsequent requests to tell Solr where to continue.

Using Cursors

To use a cursor with Solr, specify a  cursorMark parameter with the value of " * ".  You can think of this being analogous to start=0 as a way to tell Solr "start at the beginning of my sorted results" except that it also informs Solr that you want to use a Cursor.  So in addition to returning the top N sorted results (where you can control N using the rows parameter) the Solr response will also include an encoded String named nextCursorMark .  You then take the nextCursorMark String value from the response, and pass it back to Solr as the cursorMark parameter for your next request.  You can repeat this process until you've fetched as many docs as you want, or until the nextCursorMark returned matches the cursorMark you've already specified -- indicating that there are no more results.

Constraints when using Cursors

There are a few important constraints to be aware of when using cursorMark parameter in a Solr request

  1. cursorMark and start are mutually exclusive parameters
    • Your requests must either not include a start parameter, or it must be specified with a value of "0".
  2. sort clauses must include the uniqueKey field (either "asc" or "desc")
    • If id is your uniqueKey field, then sort params like id asc and name asc, id desc would both work fine, but name asc by itself would not

Cursor mark values are computed based on the sort values of each document in the result, which means multiple documents with identical sort values will produce identical Cursor mark values if one of them is the last document on a page of results.  In that situation, the subsequent request using that cursorMark would not know which of the documents with the identical mark values should be skipped. Requiring that the uniqueKey field be used as a clause in the sort criteria guarantees that a deterministic ordering will be returned, and that every cursorMark value will identify a unique point in the sequence of documents.

Cursor Examples

Fetch All Docs

The psuedo-code shown here shows the basic logic involved in fetching all documents matching a query using a cursor:

Using SolrJ, this psuedo-code would be:

If you wanted to do this by hand using curl, the sequence of requests would look something like this:

Fetch first N docs, Based on Post Processing

Since the cursor is stateless from Solr's perspective, your client code can stop fetching additional results as soon as you have decided you have enough information:

How cursors are Affected by Index Updates

Unlike basic pagination, Cursor pagination does not rely on using an absolute "offset" into the completed sorted list of matching documents.  Instead, the cursorMark specified in a request encapsulates information about the relative position of the last document returned, based on the absolute sort values of that document.  This means that the impact of index modifications is much smaller when using a cursor compared to basic pagination.

Consider the same example index described when discussing basic pagination:



  • A client requests q=*:*&rows=5&start=0&sort=name asc, id asc&cursorMark=*
    • Documents with the ids 1-5 will be returned to the client in order
  • Document id 3 is deleted
  • The client requests 5 more documents using the nextCursorMark from the previous response
    • Documents 6-10 will be returned -- the deletion of a document that's already been returned doesn't affect the relative position of the cursor
  • 3 new documents are now added with the ids 90, 91, and 92; All three documents have a name of A
  • The client requests 5 more documents using the nextCursorMark from the previous response
    • Documents 11-15 will be returned -- the addition of new documents with sort values already past does not affect the relative position of the cursor
  • Document id 1 is updated to change it's 'name' to Q
  • Document id 17 is updated to change it's 'name' to A
  • The client requests 5 more documents using the nextCursorMark from the previous response
    • The resulting documents are 16,1,18,19,20 in that order
    • Because the sort value of document 1 changed so that it is after the cursor position, the document is returned to the client twice
    • Because the sort value of document 17 changed so that it is before the cursor position, the document has been "skipped" and will not be returned to the client as the cursor continues to progress

In a nutshell: When fetching all results matching a query using cursorMark, the only way index modifications can result in a document being skipped, or returned twice, is if the sort value of the document changes. 

One way to ensure that a document will never be returned more then once, is to use the uniqueKey field as the primary (and therefore: only significant) sort criterion. 

In this situation, you will be guaranteed that each document is only returned once, no matter how it may be be modified during the use of the cursor.

"Tailing" a Cursor

Because Cursor requests are stateless, and the cursorMark values encapsulate the absolute sort values of the last document returned from a search, it's possible to "continue" fetching additional results from a cursor that has already reached its end -- if new documents are added (or existing documents are updated) to the end of the results.  You can think of this as similar to using something like "tail -f" in Unix.

The most common examples of how this can be useful is when you have a "timestamp" field recording when a document has been added/updated in your index.  Client applications can continuously poll a cursor using a sort=timestamp asc, id asc for documents matching a query, and always be notified when a document is added or updated matching the request criteria.  Another common example is when you have uniqueKey values that always increase as new documents are created, and you can continuously poll a cursor using sort=id asc to be notified about new documents.

The psuedo-code for tailing a cursor is only a slight modification from our early example for processing all docs matching a query:

  • No labels


  1. In the ABC...Z example, The sentence...

    Document id 16 is updated to change it's 'name' to A

    ... should read

    Document id 17 is updated to change it's 'name' to A
    1. thanks for catching that – fixed

  2. In the SolrJ example you might show p.setSort( "id", SolrQuery.ORDER.asc );  I do see that it's mentioned briefly in the comments.

    Further, you might want to show it as SolrQuery q = new SolrQuery( "*:*" );, and then mention that SolrQuery is a SolrParams, and that any SolrParams would do.  Of course "p" then becomes "q".  That'd be a more common pattern.

    Then it'd be q.setSort ...

    Maybe also q.setRows( 100 );

    1. i was trying to keep the examples short and focused on the cursor aspects w/o spending a lot of time on general scaffolding of solr – but good call on using SolrQuery instead – with the method chaining making the sort/rows/q concrete didn't cause the example to get any longer then the hand wavey comment i had before

  3. extra "in" here or rest of explanation missing: 

      • documents with the ids 1-5 will be returned to the client in
  4. Not sure if this is a bug or something that needs to be added to the documentation.

    With respects to cursors, when using a computed sort value (like score)  in combination with the unique field sort (score desc, id asc) seems to cause some wildly inconsistent and incomplete results.  Using a query that should return 15,000 records, I get anywhere from 6 to 11 thousand records.  Remove the score sort, everything works fine.

    1. please use the solr-user@lucene mailing list for asking questions about problems you encounter using features of solr - the comment system here is really for discussing problems/concerns/suggestions with the documentation itself.


      The behavior you are describing really makes no sense to me – but getting to the bottom of it would be impossible w/o a lot more details and discussion (on the mailing list)

  5. Important notice: To avoid OOM during paging uniqueKey field must be indexed with DocValue=True

    1. Or you can increase the heap size. The default heap size in Solr out of the box is only 512MB, which is VERY VERY small.

      1. On our case we are querying huge indexes, Increasing heap size is not an option.

        Additionally it is not recommended to define heap size more than 32GB because of JVM using 64 bit objects pointers, it is better keeping heap size small and allow lucene to use the OS filesystem cache.

        Check this article about ELK but it also relevant for SOLR: