Apache Solr Documentation

6.4 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

6.5 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 6.5

Skip to end of metadata
Go to start of metadata

Document Transformers can be used to modify the information returned about each documents in the results of a query.

Using Document Transformers

When executing a request, a document transformer can be used by including it in the fl parameter using square brackets, for example:

Some transformers allow, or require, local parameters which can be specified as key value pairs inside the brackets:

As with regular fields, you can change the key used when a Transformer adds a field to a document via a prefix:

The sections below discuss exactly what these various transformers do.

Available Transformers

[value] - ValueAugmenterFactory

Modifies every document to include the exact same value, as if it were a stored field in every document:

The above query would produce results like the following:

By default, values are returned as a String, but a "t" parameter can be specified using a value of int, float, double, or date to force a specific return type:

In addition to using these request parameters, you can configure additional named instances of ValueAugmenterFactory, or override the default behavior of the existing [value] transformer in your solrconfig.xml file:

The "value" option forces an explicit value to always be used, while the "defaultValue" option provides a default that can still be overridden using the "v" and "t" local parameters.

[explain] - ExplainAugmenterFactory

Augments each document with an inline explanation of its score exactly like the information available about each document in the debug section:

Supported values for "style" are "text", and "html", and "nl" which returns the information as structured data:

A default style can be configured by specifying an "args" parameter in your configuration:

[child] - ChildDocTransformerFactory

This transformer returns all descendant documents of each parent document matching your query in a flat list nested inside the matching parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relevant parent documents for any type of search query.

Note that this transformer can be used even though the query itself is not a Block Join query.

When using this transformer, the parentFilter parameter must be specified, and works the same as in all Block Join Queries, additional optional parameters are:

  • childFilter - query to filter which child documents should be included, this can be particularly useful when you have multiple levels of hierarchical documents (default: all children)
  • limit - the maximum number of child documents to be returned per parent document (default: 10)

[shard] - ShardAugmenterFactory

This transformer adds information about what shard each individual document came from in a distributed request. 

ShardAugmenterFactory does not support any request parameters, or configuration options.

[docid] - DocIdAugmenterFactory

This transformer adds the internal Lucene document id to each document – this is primarily only useful for debugging purposes.

DocIdAugmenterFactory does not support any request parameters, or configuration options.

[elevated] and [excluded]

These transformers are available only when using the Query Elevation Component.

  • [elevated] annotates each document to indicate if it was elevated or not.
  • [excluded] annotates each document to indicate if it would have been excluded - this is only supported if you also use the markExcludes parameter.

 

[json][xml]

These transformers replace field value containing a string representation of a valid XML or JSON structure with the actual raw XML or JSON structure rather than just the string value.  Each applies only to the specific writer, such that [json] only applies to wt=json and [xml] only applies to wt=xml

[subquery]

This transformer executes a separate query per transforming document passing document fields as an input for subquery parameters. It's usually used with {!join} and {!parent} query parsers, and is intended to be an improvement for [child].

  • It must be given an unique name: fl=*,children:[subquery]
  • There might be a few of them, eg fl=*,sons:[subquery],daughters:[subquery].
  • Every [subquery] occurrence adds a field into a result document with the given name, the value of this field is a document list, which is a result of executing subquery using document fields as an input.

Here is how it looks like in various formats:


Subquery Parameters Shift

If subquery is declared as fl=*,foo:[subquery], subquery parameters are prefixed with the given name and period. eg

q=*:*&fl=*,foo:[subquery]&foo.q=to be continued&foo.rows=10&foo.sort=id desc

Document field as an input for subquery params

It's necessary to pass some document field values as a parameter for subquery. It's supported via implicit row.fieldname parameter, and can be (but might not only) referred via Local Parameters syntax:
q=namne:john&fl=name,id,depts:[subquery]&depts.q={!terms f=id v=$row.dept_id}&depts.rows=10

Here departmens are retrieved per every employee in search result. We can say that it's like SQL join ON emp.dept_id=dept.id.

Note, when document field has multiple values they are concatenated with comma by default, it can be changed by local parameter foo:[subquery separator=' '] , this mimics {!terms} to work smoothly with it.

To log substituted subquery request parameters add the corresponding parameter names in depts.logParamsList=q,fl,rows,row.dept_id

Cores and Collections in SolrCloud

Use foo:[subquery fromIndex=departments] to invoke subquery on another core on the same node, it's what {!join} does for non-SolrCloud mode. But in case of SolrCloud just (and only) explicitly specify its' native parameters like collection, shards for subquery, eg:

q=*:*&fl=*,foo:[subquery]&foo.q=cloud&foo.collection=departments

If subquery collection has a different unique key field name (let's say foo_id at contrast to id in primary collection), add the following parameters to accommodate this difference: foo.fl=id:foo_id&foo.distrib.singlePass=true. Otherwise you'll get NullPoniterException from QueryComponent.mergeIds.


[geo] - Geospatial formatter

Formats spatial data from a spatial field using a designated format type name.  Two inner parameters are required: f for the field name, and w for the format name. Example: geojson:[geo f=mySpatialField w=GeoJSON].

Normally you'll simply be consistent in choosing the format type you want by setting the format attribute on the spatial field type to WKT or GeoJSON – see the section Spatial Search for more information. If you are consistent, it'll come out the way you stored it.  This transformer offers a convenience to transform the spatial format to something different on retrieval.

In addition, this feature is very useful with the RptWithGeometrySpatialField to avoid double-storage of the potentially large vector geometry.  This transformer will detect that field type and fetch the geometry from an internal compact binary representation on disk (in docValues), and then format it as desired.  As such, you needn't mark the field as stored, which would be redundant. In a sense this double-storage between docValues and stored-value storage isn't unique to spatial but with polygonal geometry it can be a lot of data, and furthermore you'd like to avoid storing it in a verbose format (like GeoJSON or WKT).

[features] - LTRFeatureLoggerTransformerFactory

The "LTR" prefix stands for Learning To Rank. This transformer returns the values of features and it can be used for feature extraction and feature logging.

This will return the values of the features in the yourFeatureStore store.

If you use [features] together with an Learning-To-Rank reranking query then the values of the features in the reranking model (yourModel) will be returned. 

 

 

 

  • No labels

18 Comments

  1. ChildDocTransformerFactory questions:

    If this is the result of a block join, then seems odd to need to specify filters again here?

    There's [child ...] but no [parent ...], is that handled by something else?

    1. Using subquery in Solr 6.1+ I was able to get the parent in this way:

      q=*:*
      fq=parent:false
      fl=*,foo:[subquery]
      foo.q={!terms f=id v=$row.parent_id}

      I filter on a boolean field called parent, so the main parameters refer to the child records. With the subquery, I have the id of the parent linked to a field called "parent_id" from the child. The subquery's parameters (foo.bar=) refer to records from the parent.

  2. For the config

    <transformer name="mytrans2" class="org.apache.solr.response.transform.ValueAugmenterFactory" >
    <int name="value">5</int>
    </transformer>

    Confirming that I'd invoke this with:

    fl=id,arbitrary_field_name:[mytrans2]

    And then later in results, under each <doc>, I'd have a field named "arbitrary_field_name"

  3. For [child ...] I think that's pretty recent.  I've seen conflicting info, 4.9, 5.0 or 4.8, or maybe earlier.  JIRA has it marked fixed in 4.9, but this wiki page predates that release.

    The reason I ask, if it really is pretty new, then page should point it out.

  4. Hi, i tried to run a subquery on a separate collection:

    Url

    and although I receive a numFound in the results of the field, the document list is empty.

    Response

    Do the two collections need to have the same schema?

    1. Can you try to add financials.rows=10 ? Also it's worth to check queries on timeseries collection at the logs.  

      1. Hi Mikhail Khludnev thanks for your feedback. Unfortunately it hasn't worked. I can join the two collections and view results and can't find any log issues with the timeseries collection

        1. ok. the next try is to add  financials.fl=* .Can you show a log row of the subquery issued on timeseries collection? it should ends with hits=153284 status=0 It's crucial to understand which params are supplied for subquery request.

          1. Hey Mikhail Khludnev apologies for the late reply. Unfortunately i haven't. How can i supply you with the log row?

            1. drop to mkhl [at] apache [dot] org 

  5. If we want one transformer to applied after another, how do we do it?

    1. Just put one after another. DocTransformers invoke them consequently.

      1. I figured it from the code. But it is not specified anywhere in the ref guide.

        All the transformers are per field. How do i apply transformers across fields in the document

  6. I am trying following query:

    fl=*,contents:[subquery]&contents.q={!terms f=unique_id v=$row.content_id}&group=true&group.field=record_id&group.ngroups=true

    grouping is done on record_id field. Subquery tries to retrieve contents having content_id found for this record. This content_id is compared against unique_id field. This data is present in the same core. 

    Now this subquery works for depth=1. Is it possible to have a subquery which can fetch data till e.g. 3 levels and in general n level

    1. it should be possible with contents.fl=*,level2:[subquery]&level2.q=...&level2.fl=*,level3:[subquery]&level3.q=..., but this takes a while, it's not a transitive closure ie it can't go though all levels of nesting itself (see Graph Traversal). 

      1. can you provide a link for graph traveral?

        Also can you help with this query

        1) If i have multiple conditions in subquery . i want to put some filter query e.g. contents.q={!terms f=unique_id v=$row.content_id}&contents.fq=record_name NOT NULL. This is not the exact query which i have tried, rather it is an idea

         

        2) I want to execute the transformer if a condition evaluates to true. e.g. contents.q={!terms f=unique_id v=$row.content_id} if record_name is not null (record_name is the field name in the current document)

         

        3) How to limit the document count in the above mentioned transformer