Using the amazing IKVM http://www.ikvm.net/index.html OpenNLP's java files can be converted to a .Net assembly (dll).
Thus allowing you to use the latest releases of OpenNLP from C# (or any other .net language).
So far the .Net assembly has succesfully been used for: Splitting, Tokenising, POS Tagging & Chunking. Full parsing has yet to be fully tested.
(Don't forget to unblock any downloaded files)
- Download & extract the latest OpenNlp release from http://incubator.apache.org/opennlp/download.cgi . At the time of writing this is apache-opennlp-1.5.1-incubating-bin.zip
- The three .jar files (opennlp-maxent-3.0.1-incubating.jar, jwnl-1.3.3.jar, opennlp-tools-1.5.1-incubating.jar) in the lib folder can be used to compile a .net assembly as follows.
- Download & extract the latest IKVM from http://sourceforge.net/projects/ikvm/files/ At the time of writing this is ikvmbin-0.46.0.1.zip.
- For simplicity, copy the three .jar files above into the ikvmbin-0.46.0.1/bin folder
- From a command window, In the ikvmbin-0.46.0.1/bin folder use IKVMC & the above three jar files make the opennlp.dll as follows:
- ikvmc -target:library -assembly:opennlp opennlp-maxent-3.0.1-incubating.jar jwnl-1.3.3.jar opennlp-tools-1.5.1-incubating.jar
- Copy the following from the ikvmbin-0.46.0.1/bin folder to your project folder (or the folder of your choice)
- opennlp.dll (the assembly you have just created)
(I found using reflection which IKVM dll's are referenced in the opennlp.dll )
Add references to these assemblies in your project & use at will
The OpenNlp manual is at http://incubator.apache.org/opennlp/documentation/manual/opennlp.html
You will need the models for your language which are currently here http://opennlp.sourceforge.net/models-1.5/
Note: This is still a java in .net clothes, so care has to be taken over some things.
e.g when loading models the inputstreams are java types (referenced from the assemblies above)
Workaround if an invalid format exception occurs when reading en-pos-maxent.bin
The file en-pos-maxent.bin is actually a zip archive.
If you examine the contents of this zip file, it currently has three files (the others seem to only have 2)
manifest.properties, tags.tagdict, & pos.model
Delete the tags.tagdict from the zipfile so that it only contains manifest.properties & pos.model
Note: Don't actually unzip en-pos-maxent.bin just delete tags.dagdict, so that en-pos-maxent.bin remains a Zip archive containing the remaining 2 files.