Apache Solr Documentation

6.4 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

6.5 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 6.5

Skip to end of metadata
Go to start of metadata

The SuggestComponent in Solr provides users with automatic suggestions for query terms. You can use this to implement a powerful auto-suggest feature in your search application.

Although it is possible to use the Spell Checking functionality to power autosuggest behavior, Solr has a dedicated SuggestComponent designed for this functionality. This approach utilizes Lucene's Suggester implementation and supports all of the lookup implementations available in Lucene.

The main features of this Suggester are:

  • Lookup implementation pluggability
  • Term dictionary pluggability, giving you the flexibility to choose the dictionary implementation
  • Distributed support

The solrconfig.xml found in Solr's "techproducts" example has the new Suggester implementation configured already. For more on search components, see the section RequestHandlers and SearchComponents in SolrConfig.

Covered in this section:

Configuring Suggester in solrconfig.xml

The "techproducts" example solrconfig.xml has a suggest search component and a /suggest request handler already configured. You can use that as the basis for your configuration, or create it from scratch, as detailed below.

Adding the Suggest Search Component

The first step is to add a search component to solrconfig.xml and tell it to use the SuggestComponent. Here is some sample code that could be used.

Suggester Search Component Parameters

The Suggester search component takes several configuration parameters. The choice of the lookup implementation (lookupImpl, how terms are found in the suggestion dictionary) and the dictionary implementation (dictionaryImpl, how terms are stored in the suggestion dictionary) will dictate some of the parameters required. Below are the main parameters that can be used no matter what lookup or dictionary implementation is used. In the following sections additional parameters are provided for each implementation.

Parameter

Description

searchComponent name

Arbitrary name for the search component.

name

A symbolic name for this suggester. You can refer to this name in the URL parameters and in the SearchHandler configuration. It is possible to have mutiples of these

lookupImpl

Lookup implementation. There are several possible implementations, described below in the section Lookup Implementations. If not set, the default lookup is JaspellLookupFactory.

dictionaryImplThe dictionary implementation to use. There are several possible implementations, described below in the section Dictionary Implementations . If not set, the default dictionary implementation is HighFrequencyDictionaryFactory unless a sourceLocation is used, in which case, the dictionary implementation will be FileDictionaryFactory

field

A field from the index to use as the basis of suggestion terms. If sourceLocation is empty (meaning any dictionary implementation other than FileDictionaryFactory) then terms from this field in the index will be used.

To be used as the basis for a suggestion, the field must be stored. You may want to use copyField rules to create a special 'suggest' field comprised of terms from other fields in documents. In any event, you likely want a minimal amount of analysis on the field, so an additional option is to create a field type in your schema that only uses basic tokenizers or filters. One option for such a field type is shown here:

However, this minimal analysis is not required if you want more analysis to occur on terms. If using the AnalyzingLookupFactory as your lookupImpl, however, you have the option of defining the field type rules to use for index and query time analysis.

sourceLocation

The path to the dictionary file if using the FileDictionaryFactory. If this value is empty then the main index will be used as a source of terms and weights.

storeDirThe location to store the dictionary file.

buildOnCommit or buildOnOptimize

If true then the lookup data structure will be rebuilt after soft-commit. If false, the default, then the lookup data will be built only when requested by URL parameter suggest.build=true. Use buildOnCommit to rebuild the dictionary with every soft-commit, or buildOnOptimize to build the dictionary only when the index is optimized. Some lookup implementations may take a long time to build, specially with large indexes, in such cases, using buildOnCommit or buildOnOptimize, particularly with a high frequency of softCommits is not recommended, and it's recommended instead to build the suggester at a lower frequency by manually issuing requests with suggest.build=true.

buildOnStartup

If true then the lookup data structure will be built when Solr starts or when the core is reloaded. If this parameter is not specified, the suggester will check if the lookup data structure is present on disk and build it if not found. Enabling this to true could lead to the core talking longer to load (or reload) as the suggester data structure needs to be built, which can sometimes take a long time. It’s usually preferred to have this setting set to 'false' and build suggesters manually issuing requests with suggest.build=true.

Lookup Implementations

The  lookupImpl parameter defines the algorithms used to look up terms in the suggest index. There are several possible implementations to choose from, and some require additional parameters to be configured.

AnalyzingLookupFactory

A lookup that first analyzes the incoming text and adds the analyzed form to a weighted FST, and then does the same thing at lookup time.

This implementation uses the following additional properties:

  • suggestAnalyzerFieldType: The field type to use for the query-time and build-time term suggestion analysis.
  • exactMatchFirst: If true, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.
  • preserveSep: If true, the default, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., baseball is different from base ball).
  • preservePositionIncrements: If true, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester. The default is false.
FuzzyLookupFactory

This is a suggester which is an extension of the AnalyzingSuggester but is fuzzy in nature. The similarity is measured by the Levenshtein algorithm.

This implementation uses the following additional properties:

  • exactMatchFirst: If true, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.
  • preserveSep: If true, the default, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., baseball is different from base ball).
  • maxSurfaceFormsPerAnalyzedForm: Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
  • maxGraphExpansions: When building the FST ("index-time"), we add each path through the tokenstream graph as an individual entry. This places an upper-bound on how many expansions will be added for a single suggestion. The default is -1 which means there is no limit.
  • preservePositionIncrements: If true, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester. The default is false.
  • maxEdits: The maximum number of string edits allowed. The systems hard limit is 2. The default is 1.
  • transpositions: If true, the default, transpositions should be treated as a primitive edit operation.
  • nonFuzzyPrefix: The length of the common non fuzzy prefix match which must match a suggestion. The default is 1.
  • minFuzzyLength: The minimum length of query before which any string edits will be allowed. The default is 3.
  • unicodeAware: If true, maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters will be measured in unicode code points (actual letters) instead of bytes. The default is false.
AnalyzingInfixLookupFactory

Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This uses a Lucene index for its dictionary.

This implementation uses the following additional properties.

  • indexPath: When using AnalyzingInfixSuggester you can provide your own path where the index will get built. The default is analyzingInfixSuggesterIndexDir and will be created in your collections data directory.
  • minPrefixChars: Minimum number of leading characters before PrefixQuery is used (default is 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
  • allTermsRequired: Boolean option for multiple terms. Default is true - all terms required.
  • highlight: Highlight suggest terms. Default is true.

This implementation supports Context Filtering.

BlendedInfixLookupFactory

An extension of the AnalyzingInfixSuggester which provides additional functionality to weight prefix matches across the matched documents. You can tell it to score higher if a hit is closer to the start of the suggestion or vice versa.

This implementation uses the following additional properties:

  • blenderType: used to calculate weight coefficient using the position of the first matching word. Can be one of:

    • position_linear: weightFieldValue*(1 - 0.10*position): Matches to the start will be given a higher score (Default)

    • position_reciprocal: weightFieldValue/(1+position): Matches to the end will be given a higher score.

      • exponent: an optional configuration variable for the position_reciprocal blenderType used to control how fast the score will increase or decrease. Default 2.0.
  • numFactor: The factor to multiply the number of searched elements from which results will be pruned. Default is 10.

  • indexPath: When using BlendedInfixSuggester you can provide your own path where the index will get built. The default directory name is blendedInfixSuggesterIndexDir and will be created in your collections data directory.

  • minPrefixChars: Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).

This implementation supports   Context Filtering .

FreeTextLookupFactory

It looks at the last tokens plus the prefix of whatever final token the user is typing, if present, to predict the most likely next token. The number of previous tokens that need to be considered can also be specified. This suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.

This implementation uses the following additional properties:

  • suggestFreeTextAnalyzerFieldType: The analyzer used at "query-time" and "build-time" to analyze suggestions. This field is required.
  • ngrams: The max number of tokens out of which singles will be make the dictionary. The default value is 2. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions.
FSTLookupFactory

An automaton-based lookup. This implementation is slower to build, but provides the lowest memory cost. We recommend using this implementation unless you need more sophisticated matching results, in which case you should use the Jaspell implementation.

This implementation uses the following additional properties:

  • exactMatchFirst: If true, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights.
  • weightBuckets: The number of separate buckets for weights which the suggester will use while building its dictionary.
TSTLookupFactory

A simple compact ternary trie based lookup.

WFSTLookupFactory

A weighted automaton representation which is an alternative to FSTLookup for more fine-grained ranking. WFSTLookup does not use buckets, but instead a shortest path algorithm. Note that it expects weights to be whole numbers. If weight is missing it's assumed to be 1.0. Weights affect the sorting of matching suggestions when spellcheck.onlyMorePopular=true is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights.

JaspellLookupFactory

A more complex lookup based on a ternary trie from the JaSpell project. Use this implementation if you need more sophisticated matching results.

Dictionary Implementations

The dictionary implementations define how terms are stored. There are several options, and multiple dictionaries can be used in a single request if necessary.

DocumentDictionaryFactory

A dictionary with terms, weights, and an optional payload taken from the index.

This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:

  • weightField: A field that is stored or a numeric DocValue field. This field is optional.
  • payloadField: The payloadField should be a field that is stored. This field is optional.
  • contextField: Field to be used for context filtering. Note that only some lookup implementations support filtering.
DocumentExpressionDictionaryFactory

This dictionary implementation is the same as the DocumentDictionaryFactory but allows users to specify an arbitrary expression into the 'weightExpression' tag.

This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:

  • payloadField: The payloadField should be a field that is stored. This field is optional.
  • weightExpression: An arbitrary expression used for scoring the suggestions. The fields used must be numeric fields. This field is required.
  • contextField: Field to be used for context filtering. Note that only some lookup implementations support filtering.
HighFrequencyDictionaryFactory

This dictionary implementation allows adding a threshold to prune out less frequent terms in cases where very common terms may overwhelm other terms.

This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:

  • threshold: A value between zero and one representing the minimum fraction of the total documents where a term should appear in order to be added to the lookup dictionary.
FileDictionaryFactory

This dictionary implementation allows using an external file that contains suggest entries. Weights and payloads can also be used.

If using a dictionary file, it should be a plain text file in UTF-8 encoding. You can use both single terms and phrases in the dictionary file. If adding weights or payloads, those should be separated from terms using the delimiter defined with the fieldDelimiter property (the default is '\t', the tab representation). If using payloads, the first line in the file must specify a payload.

This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:

  • fieldDelimiter: Specify the delimiter to be used separating the entries, weights and payloads. The default is tab ('\t').
Example

Multiple Dictionaries

It is possible to include multiple dictionaryImpl definitions in a single SuggestComponent definition.

To do this, simply define separate suggesters, as in this example:

When using these Suggesters in a query, you would define multiple 'suggest.dictionary' parameters in the request, referring to the names given for each Suggester in the search component definition. The response will include the terms in sections for each Suggester. See the Examples section below for an example request and response.

Adding the Suggest Request Handler

After adding the search component, a request handler must be added to solrconfig.xml. This request handler works the same as any other request handler, and allows you to configure default parameters for serving suggestion requests. The request handler definition must incorporate the "suggest" search component defined previously.

Suggest Request Handler Parameters

The following parameters allow you to set defaults for the Suggest request handler:

Parameter

Description

suggest=true

This parameter should always be true, because we always want to run the Suggester for queries submitted to this handler.

suggest.dictionary

The name of the dictionary component configured in the search component. This is a mandatory parameter. It can be set in the request handler, or sent as a parameter at query time.

suggest.q

The query to use for suggestion lookups.

suggest.count

Specifies the number of suggestions for Solr to return.

suggest.cfqA Context Filter Query used to filter suggestions based on the context field, if supported by the suggester.
suggest.build

If true, it will build the suggester index. This is likely useful only for initial requests; you would probably not want to build the dictionary on every request, particularly in a production system. If you would like to keep your dictionary up to date, you should use the buildOnCommit or buildOnOptimize parameter for the search component.

suggest.reloadIf true, it will reload the suggester index.
suggest.buildAllIf true, it will build all suggester indexes.
suggest.reloadAllIf true, it will reload all suggester indexes.

These properties can also be overridden at query time, or not set in the request handler at all and always sent at query time.

Context Filtering

Context filtering (suggest.cfq) is currently only supported by AnalyzingInfixLookupFactory and BlendedInfixLookupFactory, and only when backed by a Document*Dictionary. All other implementations will return unfiltered matches as if filtering was not requested.

Example Usages

Get Suggestions with Weights

This is the basic suggestion using a single dictionary and a single Solr core.

Example query:

In this example, we've simply requested the string 'elec' with the suggest.q parameter and requested that the suggestion dictionary be built with suggest.build (note, however, that you would likely not want to build the index on every query - instead you should use buildOnCommit or buildOnOptimize if you have regularly changing documents).

Example response:

Multiple Dictionaries

If you have defined multiple dictionaries, you can use them in queries.

Example query:

In this example we have sent the string 'elec' as the suggest.q parameter and named two suggest.dictionary definitions to be used.

Example response:

Context Filtering

Context filtering lets you filter suggestions by a separate context field, such as category, department or any other token. The AnalyzingInfixLookupFactory and BlendedInfixLookupFactory currently support this feature, when backed by DocumentDictionaryFactory.

Add contextField to your suggester configuration. This example will suggest names and allow to filter by category:

solrconfig.xml

Example context filtering suggest query:

The suggester will only bring back suggestions for products tagged with cat=memory.

 

 

  • No labels

56 Comments

  1. Needs additional info on newer 4.7 <searchComponent class="solr.SuggestComponent" name="suggest">

  2. Maybe we could replace the first line with this one, Just a suggestion - 

    Current - "The SuggestComponent in Solr is a way to provide automatic suggestions of query terms to your users. This is frequently used to suggest terms while a user is typing in a query box."

    New - "The SuggestComponent in Solr provides users with automatic suggestions for query terms. You can use this to implement a powerful auto-suggest feature on your search box."

     

    "you will need to modify your current implementation to add a search component and a request handler, described below" ...

    Here the hyperlink is not correct. It takes you to a new page right now.

     

    Some of the default values are highlighted. Some are not. We should fix those.

    1. Thanks Varun. I fixed the bold issue, thanks for catching that, and I fixed the opening suggestion very close to what you suggested.

      The hyperlink was always intended to go to a different page (in case someone wasn't familiar with search components or request handlers), but I see why it was confusing so I changed the wording and the link to be more clear.

  3. At SOLR-4.10.0 the following limitations exist:

    • Field Queries, i.e. &fq=, are not supported. I found therefore that I could not provide contextual suggestions, as suggestions were returned from the whole of the index. This is fine for suggesting things like ebay-style categories where every user can see every result (as per the example configuration), but was no use for me trying to suggest document titles, when not every user has rights to all my categories, and hence permissions to all documents. It would work in a Google-like scenario, where all documents are available to all users.
    • The SOLR core to fails to reload when using either BlendedInfixLookupFactory or AnalyzingInfixSuggesterFactory, as per JIRA SOLR-6246. This means you have to stop and start the service instead.
  4. The "rebuildAll" and the "reloadAll" Parameter are correctly spelled "suggest.rebuildAll" respective "suggest.reloadAll"

    The "FreeTextSuggesterFactory" is correctly name "FreeTextLookupFactory" and it has a mandatory parameter "suggestFreeTextAnalyzerFieldType".

    One might also mention, that the JaspellLookupFactory is the default LookupImpl and  HighFrequencyDictionaryFactory is the default dictionary implementation.

    The buildOnXX init parameters have to be specified as <str> types:

          <str name="buildOnCommit">true</str>


    Albeit  it is in my opinion syntactically incorrect, I understand it was easier/more readable to implement.

    1. Hi Frank,

      Indeed "field" needs to be stored, otherwise silently the dictionary of words will not be built. 

      You are correct on the second point also. We should fix those.

       

    2. Fixed naming SuggesterFactory->LookupFactory, and added mandatory param suggestFreeTextAnalyzerFieldType

    3. Hi Frank,

      Did you delete the previous comment?

      For this – The "FreeTextSuggesterFactory" is correctly name "FreeTextLookupFactory" and it has a mandatory parameter "suggestFreeTextAnalyzerFieldType".

      I created a JIRA - SOLR-6656

      Summarizing changes that we need to make -

      1. In "Suggester Search Component Parameters" - the field param , the fieldType should be renamed to textSuggest

      2. In "Suggester Search Component Parameters" - the field param edit "To be used as the basis for a suggestion, the field must be indexed." to "To be used as the basis for a suggestion, the field must be indexed. and stored"
      3. In JaspellLookupFactory we should mention that it is the default implementation

      4. In HighFrequencyDictionaryFactory we should mention that it is the default implementation

      5. In "Suggest Request Handler Parameters" buildAll and reloadAll should be suggest.buildAll and suggest.reloadAll (fixed)

       

      Thanks for pointing some of these issue (smile)

      Hoss Man can you please make these 5 minor edits.

      1. Changed 1, 3 and 4.

        Changed 2. Varun, you say the field needs to be indexed too. I think it only needs to be stored for suggester using DoumentDictionary (or a subclass). Am I missing something?

  5. Seems the description of "Distributed support" and "It is possible to get suggestions in SolrCloud mode, using the shards.qt parameter" was initially added since the version 4.10 document. Does it mean the Suggester component does not support distributed search before 4.10, even the basic function was introduced since version 4.7?

    And does the Suggester function support only one field ?

    Thanks!

     

    1. Good questions, both of them.

  6. Found a small typo: In the Dictionary Implementations, the parameter name "payloadfield" should be "payloadField" (with capital F)

  7. When using FileDictionaryFactory, is it true that "payload" can be specified in the dictionary? I don't think the current solr support this.

     

  8. Perhaps document how the suggester handles multiValued input. Will it correlate the values in "field" with the values in "weightField" and "payloadField" so that they match up?

    1. Yes you're correct. How about this under -> Suggester Search Component Parameters -> field 

      The field can be a multi valued field. All values from the document will correspond to the first value of the payloadField and weightField if it is present and specified in the doc..

      1. Some times you want a document-level score and payload on all values of a multi valued field (say category). But other times you may want the suggest response to be unique per value. So it would make sense to allow multiValued payloadField and scoreField, controlled by a param, e.g.

           multiValueMode=<align | alignPayloads | alignScores>

        (default=none, as today)

  9. Suggested feature to generate payload data on the fly instead of only from a stored field: SOLR-7051

  10. Some additional questions and a comment:

    1: In solrconfig.xml, it sets dictionaryImpl=DocumentDictionaryFactory (even though that's all commented out as of 4.10.3), but in the comments of this page it says that the true internal default is HighFrequencyDictionaryFactory, what's up with that?  Can somebody provide context on the internal default vs. what we show in the default solrconfig.xml?

    2: In solrconfig.xml, it sets lookupImpl=FuzzyLookupFactory, but in the comments of this page it says the true default is JaspellLookupFactory.  Similar to previous question, what's up with that?  (internal default vs. our example solrconfig.xml, context)

    3: Why is suggestAnalyzerFieldType set to type string in solrconfig.xml, wouldn't text_general make more sense?

    4: With the implementations chosen in the default example solrconfig, we require a weightField, which is set to price.  But most customer data doesn't have the field filled populated. It often still exists, but is typically blank. Normally I'd think that's OK, but since it's a required parameter, I'm nervous about leaving it set to price, knowing that most of the records are blank for that field.  The doc here doesn't mention a default.

    Also a comment on the slowdown of startup (Hoss's bug).  What's really odd is that, for the data I'm using today, the "cat" field (set in solrconfig.xml) is also mostly blank (along with price), and yet it still takes a long time to load.  Maybe it's doing a complete "record scan", despite the fact that they're all blank.

    Thanks,

    Mark

    1. For 1 - The default set in the solrconfig is DocumentDictionaryFactory. But if the suggester doesn't provide a dictionaryImpl HighFrequencyDictionaryFactory is used.
      For 2 - Same goes here. The default set in the solrconfig is FuzzyLookupFactory. But if the suggester doesn't provide a lookupImpl JaspellLookupFactory is used.
      For 4 - Looks like weightField is not mandatory. We should fix the documentation for it. If a document doesn't have a weightField entry then it defaults to 0. If the weightField is not specified altogether then all docs get a default value of 0.

      Also a comment on the slowdown of startup (Hoss's bug). -> I think it's best not to document it ? The fix has been committed for Solr 5.1 - SOLR-6845
      1. Varun, thanks.

        But for answers 1 and 2, WHY?

        You'd think the internal default would be good.

        But whoever did the example solrconfig overrode BOTH defaults.

        Presumably either because they don't like the defaults for some reason (a good question), OR they were trying to show some specific (possibly subtle) atypical use-case, but didn't say what that was. (so again, why, context?)

        A naive user (aka, me) would ask "Well, which is better?  Which developer do I trust more?  The dev who hard coded defaults in Java, or the dev who wrote the solrconfig.xml?  Or gosh, suppose it was the SAME developer who wrote both the Java code and the example solrconfig!?" (head explodes that his point)  (wink)

      2. And did you have an answer for question 3?

  11. We should also add an example for a distributed search request like this - 

     

    http://localhost:8983/solr/suggest?suggest.dictionary=mySuggester&suggest=true&suggest.build=true&suggest.q=elec&shards=localhost:8983/solr,localhost:7574/solr&shards.qt=/suggest
    1. I thought it should be possible to leave off the shards= and that it would be distributed by default?  Having to hard-code specific shards and machines and addresses would be a pain as the Solr cluster changes.

  12. Going back to Solr 4.10.2, which was the last 4x with suggester enabled and configured by default, the default solrconfig.xml gives errors.

    Stock install, standalone, injecting the example .xml docs, all against collection1.

    http://localhost:8983/solr/collection1/suggest?q=ab

    Gives the error "No suggester named default was configured"

     YES, I understand the general meaning of no default configured.  But Solr really shipped with a broken suggester in the default config!?  Was there a version that did have a working suggest pre-configured?

    1. Just tried with Solr 4.7.2 stock and it's also not configured correctly there either.

      But since that's where the feature debuted, I'd expect it to have worked.  Turns out that, in addition to "/suggest" you also still need to specify the dictionary.  I see that in the doc examples, but would have thought it'd be configured that way by default.  (typically solrconfig defines defaults)

      Oddly, the URL below no longer gives errors, BUT also doesn't suggest anything:

      http://localhost:8983/solr/collection1/suggest?suggest.dictionary=mySuggester&suggest.q=musi

      But I did index the default docs, including this entry:

      ipod_video.xml: <field name="cat">music</field>

      And this DOES give results:

      http://localhost:8983/solr/collection1/select?q=cat:music

      (this test was with 4.7.2, and I'd expect it to be similar through 4.10.2; then disabled in 4.10.3)

       

       

  13. Why is HighFrequencyDictionaryFactory 's suggested results not ordered by weight?

    1. Results will be ordered by weight/frequency (descending) by the WFSTLookupFactory.

  14. Should add better description to TST and Jaspell Lookup factories. Do they take parameters? What are the benefits? How to tune?

    Also, TSTLookupFactory will not be resolved by default since it lies under org.apache.solr.spelling.suggest.tst package. Perhaps we should add more default pkgs to the Suggester loader? Or could we register Suggester plugins as SPI so they can be found by code-name?

  15. About WFSTLookupFactory:   the document says, 

    Weights affect the sorting of matching suggestions when spellcheck.onlyMorePopular=true is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights.

    The clause "when spellcheck.onlyMorePopular=true is selected" should be deleted.  Looking at the source, the behavior is unconditional for the WFSTLookupFactory.

    In fact, if spellcheck.onlyMorePopular=true, it will throw an IllegalArgumentException.

  16. Typo:

     

    Although it is possible to use the Spell Checking functionality to power autosuggest behavior, Solr a dedicated SuggestComponent designed for this functionality.


    Although it is possible to use the Spell Checking functionality to power autosuggest behavior, Solr has a dedicated SuggestComponent designed for this functionality.

  17. For the FreeTextLookupFactory, the separator param should be documented. The suggester output will contain this separator, so the client needs to handle it. e.g. in JSON response:

        "term": "european\u001eunion"

    The default value is documented like this in code:

    /** The default character used to join multiple tokens
    * into a single ngram token. The input tokens produced
    * by the analyzer must not contain this character. */
    public static final byte DEFAULT_SEPARATOR = 0x1e;

     

  18. Actually, I'm thinking of tracking down and killing weightfield as a required field.

     

    But more to the point, StandardFilterFactory is referenced here, but as been gone since 3.1. Just take it out and let standardTokenizer and lowercaseFilter suffice?

  19. Am I right that solr.SuggestComponent uses side index while solr.SpellCheckComponent is able to use DirectSpellCheckComponent to avoid building side index? Consequently for SuggestComponent the index should be rebuilt on update to make it up-to-date. Can anyone add the information about it to the page?

  20. Cassandra Targett, please verify my changes related to SOLR-7888 - Make Lucene's AnalyzingInfixSuggester.lookup() method that takes a BooleanQuery filter parameter available in Solr Closed , (context filtering). I felt that the contextField param should go with DocumentDictionary and not the Lookups, even if only two lookup impls currently can make use of the data. To clarify I added an example to the bottom of the page, and some hints inline with each lookup that supports context filtering, along with a warning box that lookups that don't support filtering will show all data as today, even with suggest.cfq.

    1. Jan Høydahl: your changes look great. I like how you mentioned the limitations in several spots, with links to the fuller description of the feature.

  21. It would be very useful if someone who had used most of the lookup implementations if not all in their projects adds a little example of configuration, and an example in the form: "you index this...", "you query for this..." and "you'll get that..." for each of them. This way anybody who's interested about a specific suggester behavior could quickly identify what's best for his or her needs.

    I will propose such examples as soon as I'll use most of the lookup implementations in my projects and figure them out, but I'd prefer someone more experienced to do that. Thank you.

      1. Thanks a lot Erick Erickson, that is a very good starting point.

  22. Cassandra Targett: minor formatting issue, the header for FreeTextLookupFactory is now formatted as a part of BlendedInfixLookupFactory entry.

  23.     <str name="name">mySuggester</str>
        <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">suggestions</str>  (a multi valued field)

    is it possible to use multi valued fields suggestions (fieldType=text_suggest) in suggester?

  24. Thank you for your response Varun, Can you suggest me then, what is the best approach to implement autocomplete suggest in solr.

  25. I think there is either a misprint or a piece of information missing for context filtering:

    • For the FreeTextLookupFactory it states "This implementation supports Context Filtering". I think this is meant to be listed for the Lookup Factory above it, the BlendedInfixLookupFactory.
    • If the FreeTextLookupFactory does support Context Filtering then information for this lookup Implementation is missing from the Context Filtering section as only the AnalyzingInfixLookupFactory and BlendedInfixLookupFactory are listed as being supported.

    Also there is a mention in the comments but not in the documentation about the lack of support for filter queries with this Search Component, is that still the case as that comment is rather old maybe something that can be added to the documentation and is a reason to use the Spellcheck component instead.  I know context filtering does add a level of this but its unclear if you can supply multiple fields or define the context fields at query time similar to fq (More of a mailing list question than anything else).

    1. Well spotted, thanks. Fixed!

      Regarding your other comment about support for filter queries, please bring that up for discussion on solr-user list.

      1. Jan Høydahl Will do thanks for the clarification about the context filters.

  26. I don't know if it is the right place here.. But any thoughts on that http://stackoverflow.com/questions/37031357/solr-suggester-no-results would be appreciated

  27. The statement that "Blank lines and lines that start with a '#' are ignored." undef FileDictionaryFactory seems to be wrong. When loading a dictionary file with '#' prefixed lines, they just get interpreted as terms and show up in suggestion results. 

    This causes additional confusion when trying to use payloads. As stated in https://lucene.apache.org/core/6_1_0/suggest/org/apache/lucene/search/suggest/FileDictionary.html: "In order to have payload enabled, the first entry has to have a payload". However, if you happen to have a "comment" as the first line in a dictionary file (that doesn't happen to have two instances of the fieldDelimiter in it...), payloads are disabled. 

    1. It looks as you are right. Please file a JIRA issue about this.

        1. Resolved by updating the RefGuide to match the code.

  28. It would be nice if the differences between DocumentDictionaryFactory and the HighFrequencyDictionaryFactory where explained, i have been testing a bit with it and they behave very differently.

    • DocumentDictionaryFactory
      • Throws a stackoverflow on many documents (200k +)
      • Seems to suggest complete phrases
      • Preserves the case in the phrases
    • HighFrequencyDictionaryFactory
      • Seems to suggest only single words with FuzzyText
        • Except when you use a solr.KeywordTokenizerFactory then it works for phrases
      • Has a \u001e in multiple terms if used with the FreeText
      • Search is case sensitive unless in your analyzer you add a: <filter class="solr.LowerCaseFilterFactory"/> which also causes the returned results to always be lowercase

    Both the TSTLookupFactory and JaspellLookupFactory give a class not found on solr 6.2

     

     

  29. In the config example for "Multiple dictionaries", a "sortField" option is mentioned. However, it is never explained what this option does.

    The corresponding docblock comment from DocumentExpressionDictionaryFactory (Solr 6.0) is:

    /** Label used to define the name of the
    * sortField used in the {@link #WEIGHT_EXPRESSION} */

    This should probably be added to the list of DocumentExpressionDictionaryFactory options.

  30. I'm finding that WFSTLookupFactory and FSTLookupFactory only seem to match in a case-sensitive way, regardless of any lowercasing that happens in the field type.  This should be noted in their documentation.

    1. Does your field definition include a LowerCaseFitlerFactory?

      1. It sure does!  See this stackoverflow question for details http://stackoverflow.com/questions/42458050/solr-6-4-1-suggester-is-stubbornly-case-sensitive-how-to-make-case-insensitive/42500089#42500089

        Replacing WFSTLookupFactory with FuzzyLookupFactory fixes the issue for me, which leads me to the conclusion that WFSTLookupFactory simply doesn't respect or allow case-insensitivity.  But maybe I'm wrong – would love to learn a way to get WFSTLookupFactory to work.