Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

The Stats component returns simple statistics for numeric, string, and date fields within the document set.

The sample queries in this section assume you are running the "techproducts" example included with Solr:

Stats Component Parameters

The Stats Component accepts the following parameters:

Parameter

Description

stats

If true, then invokes the Stats component.

stats.field

Specifies a field for which statistics should be generated. This parameter may be invoked multiple times in a query in order to request statistics on multiple fields.

Local Parameters may be used to indicate which subset of the supported statistics should be computed, and/or that statistics should be computed over the results of an arbitrary numeric function (or query) instead of a simple field name.  See the examples below.

stats.facet

Returns sub-results for values within the specified facet.

This legacy parameter is not recommended for new users - instead please consider combining stats.field with facet.pivot

stats.calcdistinct

If true, the "countDistinct" and "distinctValues" statistics will be computed and included the response. These calculations can be very expensive for fields that do not have a tiny cardinality, so they are disabled by default.

This parameter can be specified using per-filed override (ie: f.<field>.stats.calcdistinct=true) but users are encouraged to instead the statistics desired as Local Parameter - As a top level request parameter, this option is deprecated.

Example

The query below demonstrates computing stats against two different fields numeric fields, as well as stats over the results of a a 'termfreq()' function call using the 'text' field:

http://localhost:8983/solr/techproducts/select?q=*:*&stats=true&stats.field={!func}termfreq('text','memory')&stats.field=price&stats.field=popularity&rows=0&indent=true

Statistics Supported

The table below explains the statistics supported by the Stats component.  Not all statistics are supported for all field types, and not all statistics are computed by default (See Local Parameters below for details)

Local Param

Sample Input

Description

Supported

Types

Computed

by Default

mintrue

The minimum value of the field/function in all documents in the set.

AllYes
maxtrue

The maximum value of the field/function in all documents in the set.

AllYes
sumtrue

The sum of all values of the field/function in all documents in the set.

Numeric & Date

Yes
counttrue

The number of values found in all documents in the set for this field/function.

AllYes
missingtrue

The number of documents in the set which do not have a value for this field/function.

AllYes
sumOfSquarestrue

Sum of all values squared (a by product of computing stddev)

Numeric & DateYes
meantrue

The average (v1 + v2 .... + vN)/N

Numeric & DateYes
stddevtrue

Standard deviation, measuring how widely spread the values in the data set are.

Numeric & DateYes
percentiles"1,99,99.9"A list of percentile values based on cut-off points specified by the param value. These values are an approximation, using the t-digest algorithm.NumericNo
distinctValuestrue

The set of all distinct values for the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality.

AllNo
countDistincttrue

The exact number of distinct values in the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality.

AllNo
cardinality"true" or
"0.3"
A statistical approximation (currently using the HyperLogLog algorithm) of the number of distinct values in the field/function in all of the documents in the set. This calculation is much more efficient then using the 'countDistinct' option, but may not be 100% accurate. Input for this option can be floating point number between 0.0 and 1.0 indicating how aggressively the algorithm should try to be accurate: 0.0 means use as little memory as possible; 1.0 means use as much memory as needed to be as accurate as possible. 'true' is supported as an alias for "0.3"AllNo

Local Parameters

Similar to the Facet Component, the stats.field parameter supports local parameters for:

  • Tagging & Excluding Filters: stats.field={!ex=filterA}price
  • Changing the Output Key: stats.field={!key=my_price_stats}price
  • Tagging stats for use with facet.pivot: stats.field={!tag=my_pivot_stats}price

Local parameters can also be used to specify individual statistics by name, overriding the set of statistics computed by default, eg: stats.field={!min=true max=true percentiles='99,99.9,99.99'}price

If any supported statistics are specified via local parameters, then the entire set of default statistics is overridden and only the requested statistics are computed.

Additional "Expert" local params are supported in some cases for affecting the behavior of some statistics:

  • percentiles
    • tdigestCompression - a positive numeric value defaulting to 100.0 controlling the compression factor of the T-Digest. Larger values means more accuracy, but also uses more memory.
  • cardinality
    • hllPreHashed - a boolean option indicating that the statistics are being computed over a "long" field that has already been hashed at index time – allowing the HLL computation to skip this step.
    • hllLog2m - an integer value specifying an explicit "log2m" value to use, overriding the heuristic value determined by the cardinality local param and the field type – see the java-hll documentation for more details
    • hllRegwidth - an integer value specifying an explicit "regwidth" value to use, overriding the heuristic value determined by the cardinality local param and the field type – see the java-hll documentation for more details
  • calcDistinct - for backwards compatibility, calcDistinct=true may be specified as an alias for both countDistinct=true distinctValues=true

Examples

Here we compute some statistics for the price field.  The min, max, mean, 90th, and 99th percentile price values are computed against all products that are in stock (q=*:* and fq=inStock:true), and independently all of the default statistics are computed against all products regardless of whether they are in stock or not (by excluding that filter).

http://localhost:8983/solr/techproducts/select?q=*:*&fq={!tag=stock_check}inStock:true&stats=true&stats.field={!ex=stock_check+key=instock_prices+min=true+max=true+mean=true+percentiles='90,99'}price&stats.field={!key=all_prices}price&rows=0&indent=true

The Stats Component and Faceting

Although the stats.facet parameter is no longer recommended, sets of stats.field parameters can be referenced by 'tag' when using Pivot Faceting to compute multiple statistics at every level (i.e.: field) in the tree of pivot constraints.

For more information and a detailed example, please see Combining Stats Component With Pivots.

 

  • No labels

1 Comment

  1. It would by very useful to have description of Analytics Component, which is quite capable (see SOLR-5302 - Analytics Component Closed ).