Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

With the Learning To Rank (or LTR for short) contrib module you can configure and run machine learned ranking models in Solr. The module also supports feature extraction inside Solr. The only thing you need to do outside Solr is train your own ranking model.

Topics covered in this section:

Concepts

Re-Ranking

Re-Ranking allows you to run a simple query for matching documents and then re-rank the top N documents using the scores from a different, complex query. This page describes the use of LTR complex queries, information on other rank queries included in the Solr distribution can be found on the Query Re-Ranking page.

Learning To Rank

In information retrieval systems, Learning to Rank is used to re-rank the top N retrieved documents using trained machine learning models. The hope is that such sophisticated models can make more nuanced ranking decisions than standard ranking functions like TF-IDF or BM25.

Model

A ranking model computes the scores used to rerank documents. Irrespective of any particular algorithm or implementation, a ranking model's computation can use three types of inputs:

  • parameters that represent the scoring algorithm
  • features that represent the document being scored
  • features that represent the query for which the document is being scored

Feature

A feature is a value, a number, that represents some quantity or quality of the document being scored or of the query for which documents are being scored. For example documents often have a 'recency' quality and 'number of past purchases' might be a quantity that is passed to Solr as part of the search query.

Normalizer

Some ranking models expect features on a particular scale. A normalizer can be used to translate arbitrary feature values into normalized values e.g. on a 0..1 or 0..100 scale.

Training

Feature engineering

The LTR contrib module includes several feature classes as well as support for custom features. Each feature class's javadocs contain an example to illustrate use of that class. The process of feature engineering itself is then entirely up to your domain expertise and creativity.

FeatureClassExample parametersExternal Feature Information
field lengthFieldLengthFeature{"field":"title"}not (yet) supported
field valueFieldValueFeature {"field":"hits"}not (yet) supported 
original scoreOriginalScoreFeature {}not applicable
solr querySolrFeature{"q":"{!func} recip(ms(NOW,last_modified) ,3.16e-11,1,1)"}supported
solr filter querySolrFeature{"fq":["{!terms f=category}book"]}supported 
solr query + filter querySolrFeature{"q":"{!func} recip(ms(NOW,last_modified), 3.16e-11,1,1)", "fq":["{!terms f=category}book"]}supported
valueValueFeature{"value":"${userFromMobile}","required":true}supported
(custom)(custom class extending Feature)  
NormalizerClassExample parameters
IdentityIdentityNormalizer{}
MinMaxMinMaxNormalizer{"min":"0", "max":"50" }
StandardStandardNormalizer{"avg":"42","std":"6"}
(custom)(custom class extending Normalizer) 

Feature extraction

The ltr contrib module includes a [features] transformer to support the calculation and return of feature values for feature extraction purposes including and especially when you do not yet have an actual reranking model.

Feature selection and model training

Feature selection and model training take place offline and outside Solr. The ltr contrib module supports two generalized forms of models as well as custom models. Each model class's javadocs contain an example to illustrate configuration of that class. In the form of JSON files your trained model or models (e.g. different models for different customer geographies) can then be directly uploaded into Solr using provided REST APIs.

General formClassSpecific examples
LinearLinearModelRankSVM, Pranking
Multiple Additive TreesMultipleAdditiveTreesModelLambdaMART, Gradient Boosted Regression Trees (GBRT)
(custom)(custom class extending LTRScoringModel)(not applicable)

 

Quick Start Example

The "techproducts" example included with Solr is pre-configured with the plugins required for learning-to-rank, but they are disabled by default.

To enable the plugins, please specify the "solr.ltr.enabled" JVM System Property when running the example:

Uploading features

To upload features in a /path/myFeatures.json file, please run:

 To view the features you just uploaded please open the following URL in a browser:

 

Example: /path/myFeatures.json

Extracting features

To extract features as part of a query, add [features] to the fl parameter, for example:

The output XML will include feature values as a comma-separated list, resembling the output shown here:

Uploading a model

To upload the model in a /path/myModel.json file, please run:

 To view the model you just uploaded please open the following URL in a browser:

 

Example: /path/myModel.json

Running a rerank query

To rerank the results of a query, add the rq parameter to your search, for example:

The addition of the rq parameter will not change the output XML of the search.

To obtain the feature values computed during reranking, add [features] to the fl parameter, for example:

 The output XML will include feature values as a comma-separated list, resembling the output shown here:

External Feature Information

The ValueFeature and SolrFeature classes support the use of external feature information, efi for short.

Uploading features

To upload features in a /path/myEfiFeatures.json file, please run:

 To view the features you just uploaded please open the following URL in a browser:

 

Example: /path/myEfiFeatures.json

As an aside, you may have noticed that the myEfiFeatures.json example uses "store":"myEfiFeatureStore" attributes: read more about feature stores in the Lifecycle section of this page.

Extracting features

To extract myEfiFeatureStore features as part of a query, add efi.* parameters to the [features] part of the fl parameter, for example:

Uploading a model

To upload the model in a /path/myEfiModel.json file, please run:

 To view the model you just uploaded please open the following URL in a browser:

 

Example: /path/myEfiModel.json

Running a rerank query

To obtain the feature values computed during reranking, add [features] to the fl parameter and efi.* parameters to the rq parameter, for example:

Notice the absence of efi.* parameters in the [features] part of the fl parameter.

Extracting features whilst reranking

To extract features for myEfiFeatureStore's features whilst still reranking with myModel:

Notice the absence of efi.* parameters in the rq parameter (because myModel does not use efi feature) and the presence of efi.* parameters in the [features] part of the fl parameter (because myEfiFeatureStore contains efi features).

Read more about model evolution in the Lifecycle section of this page.

Training example

Example training data and a demo 'train and upload model' script can be found in the solr/contrib/ltr/example folder in the Apache lucene-solr git repository which is mirrored on github.com (the solr/contrib/ltr/example folder is not shipped in the solr binary release).

Installation

The ltr contrib module requires the dist/solr-ltr-*.jar JARs.

Configuration

Learning-To-Rank is a contrib module and therefore its plugins must be configured in solrconfig.xml.

Minimum requirements

  • Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need adjustments to your configuration, or an explicit specification of the $solr.install.dir.

 

  • Declaration of the ltr query parser.

 

  • Configuration of the feature values cache.

 

  • Declaration of the [features] transformer.

Advanced options

LTRThreadModule

A thread module can be configured for the query parser and/or the transformer to parallelize the creation of feature weights. For details, please refer to the LTRThreadModule javadocs.

Feature vector customization

The features transformer returns dense csv values such as "featureA=0.1,featureB=0.2,featureC=0.3,featureD=0.0".

For sparse csv output such as "featureA:0.1 featureB:0.2 featureC:0.3" you can customize the feature logger transformer declaration in solrconfig.xml as follows:

Implementation and contributions

How does Solr Learning-To-Rank work under the hood?

Please refer to the ltr javadocs for an implementation overview.

How could i write additional models and/or features?

Lifecycle

Feature stores

It is recommended that you organise all your features into stores which are akin to namespaces:

  • Features within a store must be named uniquely.
  • Across stores identical or similar features can share the same name.
  • If no store name is specified then the default _DEFAULT_ feature store will be used.

To discover the names of all your feature stores:

To inspect the content of the commonFeatureStore feature store:

Models

  • A model uses features from exactly one feature store.
  • If no store is specified then the default _DEFAULT_ feature store will be used.
  • A model need not use all the features defined in a feature store.
  • Multiple models can use the same feature store.

To extract features for currentFeatureStore's features:

To extract features for nextFeatureStore's features whilst reranking with currentModel based on currentFeatureStore:

To view all models:

To delete the currentModel model:

A feature store must be deleted only when there are no models using it.

To delete the currentFeatureStore feature store:

Applying changes

The feature store and the model store are both Managed Resources. Changes made to managed resources are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.

Examples

One feature store, multiple ranking models

  • leftModel and rightModel both use features from commonFeatureStore and the only different between the two models is the weights attached to each feature.
  • Conventions used:
    • commonFeatureStore.json file contains features for the commonFeatureStore feature store
    • leftModel.json file contains model named leftModel
    • rightModel.json file contains model named rightModel
    • The model's features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.
    • The stores features are sorted alphabetically by name, this makes it easy to lookup features used in the models
Example: /path/commonFeatureStore.json
Example: /path/leftModel.json
Example: /path/rightModel.json

Model evolution

  • linearModel201701 uses features from featureStore201701
  • treesModel201702 uses features from featureStore201702
  • linearModel201701 and treesModel201702 and their feature stores can co-exist whilst both are needed.
  • When linearModel201701 has been deleted then featureStore201701 can also be deleted.
  • Conventions used:
    • <store>.json file contains features for the <store> feature store
    • <model>.json file contains model name <model>
    • a 'generation' id (e.g. YYYYMM year-month) is part of the feature store and model names
    • The model's features and weights are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two models are.
    • The stores features are sorted alphabetically by name, this makes it easy to see what the commonalities and differences between the two feature stores are.
Example: /path/featureStore201701.json
Example: /path/linearModel201701.json
Example: /path/featureStore201702.json
Example: /path/treesModel201702.json

Additional Resources

 

  • No labels

12 Comments

  1. Has anyone been able to run MultipleAdditiveTreesModels? Despite re-compiling from the source, I can't get these trees working. It keeps saying "Model type does not exist org.apache.solr.ltr.model.MultipleAdditiveTreesModel".

    1. I'm using the features and model example from this page ( /path/treesModel201702.json). 

      Error invoking setter setTrees on class : org.apache.solr.ltr.model.MultipleAdditiveTreesModel

      1. Hi Mike,

        Thanks for the feedback. There were some double-quotes missing in the /path/treesModel201702.json example and I just fixed them. If you try again then the example should now work.

        Regards,
        Christine

  2. Hi, is anyone facing issues with myEfiFeatures.json, I keep receiving the following error:

    {
      "responseHeader":{
        "status":400,
        "QTime":1},
      "error":{
        "metadata":[
          "error-class","org.apache.solr.common.SolrException",
          "root-error-class","java.lang.ClassCastException"],
        "msg":"org.apache.solr.ltr.model.ModelException: Model type does not exist org.apache.solr.ltr.feature.ValueFeature",
        "code":400}
    }

     

    It has to do with ValueFeatures class being absent I guess but I am not sure, I have tried looking at the Lifecycles and featureStores but couldn't follow much, any idea on how to tackle this or if someone can point me to the right set of resources.

     

    Thanks.

    Shanky

    1. Hello Shanky, could you provide the exact query with which you encountered the issue with the myEfiFeatures.json example above? Thanks.

      1. Hi Christine,

        curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' --data-binary "@/path/myEFIFeatures.json" -H 'Content-type:application/json'

        I was basically trying the example for learning to Rank explained above.

        Thanks.

        1. Hi Shanky, from the error and the command that you posted, it seems that you are trying to send a json containing features to the model endpoint (http://localhost:8983/solr/techproducts/schema/model-store
          (the error is saying that the ValueFeature is not a valid model - indeed it is a feature- , we can improve the message (wink) ) 
          Can you double check if  myEFIFeatures.json contains features declarations, and if yes, replace it with myEfiModel.json? 

          Thanks,
          Diego 

  3. Thanks for great feature contribution to solr project. I have a use case of personalization and wondering whether you can help me there. I would like to rerank my query based on the relationship of searcher with the author of the returned documents. I do have relationship score in the external datastore in form of user1(searcher), user2(author), relationship score. In my query, I can pass searcher id as external feature. My question is that during querying, how do I retrieve relationship score for each documents as a feature and rerank the documents. Would I need to implement a custom feature to do so? and How to implement the custom feature.  

  4. Is there a minimum solr version requirement? Is the plugin usable on solr 5.x? If so, are there any installation tips?

    1. This feature was released in Solr 6.4.0. Considering the changes between Solr 5.x and 6.x, I think it unlikely that it would work in 5.x, but if you look at the issue ( SOLR-8542 - Integrate Learning to Rank into Solr Resolved ), that may have additional information for you.

  5. Anyone having the following error when trying to query using a LTR model?

     

    org.apache.solr.common.SolrException: java.lang.UnsupportedOperationException: Query  does not implement createWeight
    	at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:138)
    	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2030)
    	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1844)
    	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:609)
    1. I'm seeing it also. For me, similar query works in single instance mode but doesn't on solrCloud. 

      Here is the complete stacktrace

      s.h.RequestHandlerBase org.apache.solr.common.SolrException: java.lang.UnsupportedOperationException: Query  does not implement createWeight
       at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:138)
       at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1806)
       at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1620)
       at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:617)
       at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:531)
       at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
       at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
       at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
       at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
       at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
       at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
       at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
       at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
       at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
       at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
       at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
       at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
       at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
       at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
       at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
       at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
       at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
       at org.eclipse.jetty.server.Server.handle(Server.java:518)
       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
       at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
       at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
       at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
       at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
       at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
       at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
       at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
       at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.UnsupportedOperationException: Query  does not implement createWeight
       at org.apache.lucene.search.Query.createWeight(Query.java:66)
       at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:752)
       at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:735)
       at org.apache.solr.ltr.LTRRescorer.rescore(LTRRescorer.java:117)
       at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:104)
       ... 35 more