Apache Solr Documentation

6.4 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

6.5 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 6.5

Skip to end of metadata
Go to start of metadata

The Extended DisMax (eDisMax) query parser is an improved version of the DisMax query parser. In addition to supporting all the DisMax query parser parameters, Extended Dismax:

  • supports the full Lucene query parser syntax.
  • supports queries such as AND, OR, NOT, -, and +.
  • treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
    respects the 'magic field' names _val_ and _query_. These are not a real fields in the Schema, but if used it helps do special things (like a function query in the case of _val_ or a nested query in the case of _query_). If _val_ is used in a term or phrase query, the value is parsed as a function.
  • includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
  • improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
  • includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
  • includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.
  • supports pure negative nested queries: queries such as +foo (-foo) will match all documents.
  • lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.

Extended DisMax Parameters

In addition to all the DisMax parameters, Extended DisMax includes these query parameters:

The sow Parameter

Split on whitespace: if set to false, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to true: text analysis is invoked separately for each individual whitespace-separated term.

The mm.autoRelax Parameter

If true, the number of clauses required (minimum should match) will automatically be relaxed if a clause is removed (by e.g. stopwords filter) from some but not all qf fields. Use this parameter as a workaround if you experience that queries return zero hits due to uneven stopword removal between the qf fields.

Note that relaxing mm may cause undesired side effects, hurting the precision of the search, depending on the nature of your index content.

The boost Parameter

A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the BoostQParserPlugin

The lowercaseOperators Parameter

A Boolean parameter indicating if lowercase "and" and "or" should be treated the same as operators "AND" and "OR".

The ps Parameter

Default amount of slop on phrase queries built with pf, pf2 and/or pf3 fields (affects boosting).

The pf2 Parameter

A multivalued list of fields with optional weights, based on pairs of word shingles.

The ps2 Parameter

This is similar to ps but overrides the slop factor used for pf2. If not specified, ps is used.

The pf3 Parameter

A multivalued list of fields with optional weights, based on triplets of word shingles. Similar to pf, except that instead of building a phrase per field out of all the words in the input, it builds a set of phrases for each field out of each triplet of word shingles.

The ps3 Parameter

This is similar to ps but overrides the slop factor used for pf3. If not specified, ps is used.

The stopwords Parameter

A Boolean parameter indicating if the StopFilterFactory configured in the query analyzer should be respected when parsing the query: if it is false, then the StopFilterFactory in the query analyzer is ignored.

The uf Parameter

Specifies which schema fields the end user is allowed to explicitly query. This parameter supports wildcards. The default is to allow all fields, equivalent to uf=*. To allow only title field, use uf=title. To allow title and all fields ending with _s, use uf=title,*_s. To allow all fields except title, use uf=*-title. To disallow all fielded searches, use uf=-*.

Field aliasing using per-field qf overrides

Per-field overrides of the qf parameter may be specified to provide 1-to-many aliasing from field names specified in the query string, to field names used in the underlying query. By default, no aliasing is used and field names specified in the query string are treated as literal field names in the index.

Examples of Queries Submitted to the Extended DisMax Query Parser

All of the sample URLs in this section assume you are running Solr's "techproducts" example:

Boost the result of the query term "hello" based on the document's popularity:

Search for iPods OR video:

Search across multiple fields, specifying (via boosts) how important each field is relative each other:

You can boost results that have a field that matches a specific value:

Using the "mm" param, 1 and 2 word queries require that all of the optional clauses match, but for queries with three or more clauses one missing clause is allowed:

In the example below, we see a per-field override of the qf parameter being used to alias "name" in the query string to either the "last_name" and "first_name" fields:

Using negative boost

Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too.

Using 'slop'

Dismax and Edismax can run queries against all query fields, and also run a query in the form of a phrase against the phrase fields. (This will work only for boosting documents, not actually for matching.) However, that phrase query can have a 'slop,' which is the distance between the terms of the query while still considering it a phrase match. For example:

With these parameters, the Dismax Query Parser generates a query that looks something like this:

But it also generates another query that will only be used for boosting results:

Thus, any document that has the terms "foo" and "bar" will match; however if some of those documents have both of the terms as a phrase, it will score much higher because it's more relevant.

If you add the parameter ps (phrase slop), the second query will instead be:

This means that if the terms "foo" and "bar" appear in the document with less than 10 terms between each other, the phrase will match. For example the doc that says:

will match the phrase query.

How does one use phrase slop? Usually it is configured in the request handler (in solrconfig).

With query slop (qs) the concept is similar, but it applies to explicit phrase queries from the user. For example, if you want to search for a name, you could enter:

A document that contains "Hans Anderson" will match, but a document that contains the middle name "Christian" or where the name is written with the last name first ("Anderson, Hans") won't. For those cases one could configure the query field qs, so that even if the user searches for an explicit phrase query, a slop is applied.

Finally, in addition to the phrase fields (pf) parameter,  edismax also supports the pf2 and pf3 parameters, for fields over which to create bigram and trigram phrase queries.  The phrase slop for these parameters' queries can be specified using the ps2 and ps3 parameters, respectively.  If you use pf2/pf3 but ps2/ps3, then the phrase slop for these parameters' queries will be taken from the ps parameter, if any.

Using the 'magic fields' _val_ and _query_

The Solr Query Parser's use of _val_ and _query_ differs from the Lucene Query Parser in the following ways:

  • If the magic field name _val_ is used in a term or phrase query, the value is parsed as a function.
  • It provides a hook into FunctionQuery syntax. Quotes are necessary to encapsulate the function when it includes parentheses. For example:

  • The Solr Query Parser offers nested query support for any type of query parser (via QParserPlugin). Quotes are often necessary to encapsulate the nested query if it contains reserved characters. For example:

Although not technically a syntax difference, note that if you use the Solr TrieDateField type, any queries on those fields (typically range queries) should use either the Complete ISO 8601 Date syntax that field supports, or the DateMath Syntax to get relative dates. For example:

TO must be uppercase, or Solr will report a 'Range Group' error.

 

 

  • No labels

3 Comments

  1. I think something wrong with this example :

    _query_:"{\!dismax;qf=myfield}how;now;brown;cow"
    it has semicolons and back slash before exclamation mark.
    1. Thanks Ahmet, it's fixed (and I cleaned up the formatting of the code blocks some).

    1. This page is a sort of patch the reader is supposed to mentally apply to the dismax docs, but there is stuff on the dismax page that doesn't apply to edismax, so it's a sucky user experience.  I think edismax docs should be in one place.  Alternatively (or maybe complementarily?) Cassandra mentioned to me the possibility of a single “query parameter reference” page that has all of the various params as a chart and shows what parsers it is supported with, as well as examples.
    2. The "Examples of Queries Submitted to the Extended DisMax Query Parser" section would be better placed either at the end of the page or perhaps broken up with each examples placed in the area corresponding to the usages it contains.
    3. This page should be audited for mention of dismax, and replaced with edismax.