Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

The Terms Component provides access to the indexed terms in a field and the number of documents that match each term. This can be useful for building an auto-suggest feature or any other feature that operates at the term level instead of the search or document level. Retrieving terms in index order is very fast since the implementation directly uses Lucene's TermEnum to iterate over the term dictionary.

In a sense, this search component provides fast field-faceting over the whole index, not restricted by the base query or any filters. The document frequencies returned are the number of documents that match the term, including any documents that have been marked for deletion but not yet removed from the index.

Configuring the Terms Component

By default, the Terms Component is already configured in solrconfig.xml for each collection.

Defining the Terms Component

Defining the Terms search component is straightforward: simply give it a name and use the class solr.TermsComponent.

This makes the component available for use, but by itself will not be useable until included with a request handler.

Using the Terms Component in a Request Handler

The terms component is included with the /terms request handler, which is among Solr's out-of-the-box request handlers - see Implicit RequestHandlers.

Note that the defaults for this request handler set the parameter "terms" to true, which allows terms to be returned on request. The parameter "distrib" is set to false, which allows this handler to be used only on a single Solr core.

You could add this component to another handler if you wanted to, and pass "terms=true" in the HTTP request in order to get terms back. If it is only defined in a separate handler, you must use that handler when querying in order to get terms and not regular documents as results.

Terms Component Parameters

The parameters below allow you to control what terms are returned. You can also configure any of these with the request handler if you'd like to set them permanently. Or, you can add them to the query request. These parameters are:

Parameter

Required

Default

Description

terms

No

false

If set to true, enables the Terms Component. By default, the Terms Component is off.

Example: terms=true

terms.fl

Yes

null

Specifies the field from which to retrieve terms.

Example: terms.fl=title

terms.listNonull

Fetches the document frequency for a comma delimited list of terms. Terms are always returned in index order. If 'terms.ttf' is set to true, also returns their total term frequency. If multiple 'terms.fl' are defined, these statistics will be returned for each term in each requested field.

Example: terms.list=termA,termB,termC

terms.limit

No

10

Specifies the maximum number of terms to return. The default is 10. If the limit is set to a number less than 0, then no maximum limit is enforced. Although this is not required, either this parameter or terms.upper must be defined.

Example: terms.limit=20

terms.lower

No

empty string

Specifies the term at which to start. If not specified, the empty string is used, causing Solr to start at the beginning of the field.

Example: terms.lower=orange

terms.lower.incl

No

true

If set to true, includes the lower-bound term (specified with terms.lower in the result set.

Example: terms.lower.incl=false

terms.mincount

No

null

Specifies the minimum document frequency to return in order for a term to be included in a query response. Results are inclusive of the mincount (that is, >= mincount).

Example: terms.mincount=5

terms.maxcount

No

null

Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount).

Example: terms.maxcount=25

terms.prefix

No

null

Restricts matches to terms that begin with the specified string.

Example: terms.prefix=inter

terms.raw

No

false

If set to true, returns the raw characters of the indexed term, regardless of whether it is human-readable. For instance, the indexed form of numeric numbers is not human-readable.

Example: terms.raw=true

terms.regex

No

null

Restricts matches to terms that match the regular expression.

Example: terms.regex=.*pedist

terms.regex.flag

No

null

Defines a Java regex flag to use when evaluating the regular expression defined with terms.regex. See http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html for details of each flag. Valid options are:

  • case_insensitive
  • comments
  • multiline
  • literal
  • dotall
  • unicode_case
  • canon_eq
  • unix_lines

Example: terms.regex.flag=case_insensitive

terms.statsNonullInclude index statistics in the results. Currently returns only the numDocs for a collection. When combined with terms.list it provides enough information to compute idf for a list of terms.

terms.sort

No

count

Defines how to sort the terms returned. Valid options are count, which sorts by the term frequency, with the highest term frequency first, or index, which sorts in index order.

Example: terms.sort=index

terms.ttfNofalse

If set to true, returns both 'df' (docFreq) and 'ttf' (totalTermFreq) statistics for each requested term in 'terms.list'. In this case, the response format is:

terms.upper

No

null

Specifies the term to stop at. Although this parameter is not required, either this parameter or terms.limit must be defined.

Example: terms.upper=plum

terms.upper.incl

No

false

If set to true, the upper bound term is included in the result set. The default is false.

Example: terms.upper.incl=true

The output is a list of the terms and their document frequency values. See below for examples.

Examples

All of the following sample queries work with Solr's "bin/solr -e techproducts" example.

Get Top 10 Terms

This query requests the first ten terms in the name field: http://localhost:8983/solr/techproducts/terms?terms.fl=name

Results:

Get First 10 Terms Starting with Letter 'a'

This query requests the first ten terms in the name field, in index order (instead of the top 10 results by document count): http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.lower=a&terms.sort=index

Results:

SolrJ invocation

Using the Terms Component for an Auto-Suggest Feature

If the Suggester doesn't suit your needs, you can use the Terms component in Solr to build a similar feature for your own search application. Simply submit a query specifying whatever characters the user has typed so far as a prefix. For example, if the user has typed "at", the search engine's interface would submit the following query:

http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.prefix=at

Result:

You can use the parameter omitHeader=true to omit the response header from the query response, like in this example, which also returns the response in JSON format: http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.prefix=at&indent=true&wt=json&omitHeader=true

Result:

Distributed Search Support

The TermsComponent also supports distributed indexes. For the /terms request handler, you must provide the following two parameters:

Parameter

Description

shards

Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see Distributed Search with Index Sharding.

shards.qt

Specifies the request handler Solr uses for requests to shards.

More Resources

  • No labels

5 Comments

  1. I still intend to change this page a little and include the solrconfig.xml configuration (defining the searchComponent and then the /terms requestHandler).

  2. "Get First 10 Terms, Starting with Letter 'a'
    This query requests the first ten terms in the name field, beginning with the first term that begins with the letter a: http://localhost:8983/solr/terms?terms.fl=name&terms.lower=a"

    I don't think is correct. In this way we are saying to Solr, give me the Terms starting from the "a" term, in the lexicographical order.

    To model this requirement : "Get First 10 Terms, Starting with Letter 'a' we should use : http://localhost:8983/solr/terms?terms.fl=name&terms.prefix=a

    Also the related results are not relevant to the use case.

    Cheers

    1. Thanks Alessandro.

      I fixed the example and related results to more accurately show what terms.lower does. The response for that example was actually correct, but the description was misleading, I think. I changed the example and description to better show what that param does. I left it using terms.lower (but sorted the documents so the 'a' terms show in the response). The next section about auto-complete shows a good example using terms.prefix.

      Please post another comment if there are other things you notice about the examples or text.

  3. Hi Cassandra Targett, thanks a lot for your post it is awsome. I have a quick question. I was using terms component  and it works great but now I need the terms component filtering by another filed. I want to keep terms component becouse it can search in a case insensitive way, with regex you can search words inside words/phrases (not necessary the prefix), the terms component result is so easy to parse and it returns the exact information that I want for an autocomplete. but I want to filter by an additional field. Is there a way to implement or extend this component functionality to do that? I dont want to use solr facet becpuse it searches only by the prefix and it do not search for words in the middle in a case insensitive way.