<!> Solr3.5 To configure the Hunspell stemmer in Solr, you need to download the .dic
and .aff
files for your language(s) and then add the HunspellStemFilterFactory to your analysis, like this (british):
<filter class="solr.HunspellStemFilterFactory" dictionary="en_GB.dic" affix="en_GB.aff" ignoreCase="true" />
The dictionary
parameter optionally takes a comma-separated list of dictionaries, in which case all will be loaded, in the order specified. This lets you maintain your own custom additions without needing to edit the originals. We encourage your to contribute your changes/additions back to the maintainers of the original dictionaries.
The ignoreCase
parameter allows case insensitive matching of the dictionaries, which can be useful to stem variations for proper names such as Apache/Apaches. Default value is false
.
An example of how Hunspell may be more accurate than the Snowball stemmer, from Norwegian:
bil (car) biler (cars) billig (cheap) billige billigere (cheaper) Snowball bil bil bil (car) bil billiger (N/A) Hunspell bil bil billig billig billig (cheap) bile (drive) billige (pl) billige (pl)
Note that Hunspell's suitability for stemming purposes will vary depending on the quality of the dictionaries and affix files. Always test the quality of various stemmers before deciding on which to choose for your language. Another potential disadvantage with a dictionary based stemmer is that it only works for words listed in the dictionary, so be prepared to invest some time in adding new or domain specific vocabulary to the dictionaries.