Setting up SimpleTextCodec
Solr4.0+ only. New to 4.0 is the ability to create per-field codecs. An example of this is the SimpleTextCodec that is distributed with the solr source code. However, the codecs aren't part of the binary distribution, which has caused some confusion. These instructions will allow you to use the SimpleTextCodec as an exemplar.
if you get the help output, you're good to go.
- Get the source code
- Build the example
- Build the codec jar
- Modify the solronfig.xml file
- Modify your schema.xml file
Get the source code
Just follow the instructions at: How To Contribute. The short form is to execute the following comand:
for trunk, or:
for the 4.x branch.
We'll call the directory all this got checked out into SOLR_CODE which will probably be something like <where you checked things out>/branch_4x
Build the example
Now you need to build the example code. Note: this produces the same code as is present in the "example" directory in the Solr distro.
This may take a while. You may be prompted to execute a separate step to install Apache Ivy if you don't already have it on your computer. If you don't, the instructions to install it will be printed out on the screen when you type "ant example". Follow them and re-execute "ant example".
You should see "BUILD SUCCESSFUL" eventually.
Build the codec jar
Here's where it gets a bit tricky. The SimpleTextCodec is not built by the step above. So here's what you do:
Again, you should see "BUILD SUCCESSFUL" printed out. But just above that you should see:
"Building jar: SOLR_CODE/lucene/build/codecs/lucene-codecs-<version>.jar". This is the jar file that you'll need to have , make a note of it.
Modify the solronfig.xml file
This file is located in SOLR_CODE/solr/example/solr/collection1/conf. There are a couple of things you need to do
- Make the jar available to Solr next time you start it.
- Load the CodecFactory when Solr starts
Make the jar available to Solr next time you start it
Add a line like this. I put this after the other <lib> directives, but it's pretty arbitrary as long as it's a direct child of <config>.
Load the CodecFactory when Solr starts
Add a line like this. Again where this goes is arbitrary, it just has to be a direct child of <config>. This causes Solr to load this class at startup.
Modify your schema.xml file
This file is located in SOLR_CODE/solr/example/solr/collection1/conf
Whew! all that is preliminary. The rest is more straight-forward. You have to define a fieldType that uses the coded and you have to use that fieldType in some of your fields. NOTE: it is NOT necessary to use these in all your fields, you can specify codecs on a per-field basis.
You only have two more steps...
- Add a new fieldType using the SimpleTextCodec
- Use the new fieldType in some fields
Add a new fieldType using the SimpleTextCodec
Add something like this to the <types> section
This is not a very interesting fieldType, notice it's based on the "StrField" which means that it's not analyzed in any way, so searching is only for the exact input. Of course you can use fieldTypes with analysis chains like this. Note that this is based on TextField.
Use the new fieldType in some fields
Add some lines like this to the <fields> section
At this point, you should have the SimpleText results in your SOLR_CODE/solr/example/solr/collection1/data/index directory, look for files of the form: *SimpleTest*.pst
As always, the first time someone actually follows instructions deficiencies pop out. Feel free to modify this page with whatever clarifications you think would be helpful.