This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Hello, world!
Skip to end of metadata
Go to start of metadata

Instructions to train and run a simple parts-of-speech (PoS) tagger program. Instructions are for Unix, but adaptable for Windows.

Unless otherwise specified, save downloads to $HOME/archives.

  1. Download and install Java.
  2. Download and install Maven.
  3. Download OpenNLP.
  4. Download a PoS Treebank training set into $HOME/archives/pos.
  5. Create development, library, and data directories:
    mkdir -p $HOME/dev/java/nlp/lib/
    mkdir -p $HOME/dev/java/nlp/data/
  6. Change to development directory:
    cd $HOME/dev/java/nlp/
  7. Extract files:
    tar zxf $HOME/archives/apache-opennlp-*-incubating-src.tar.gz
  8. Rename directory:
    mv apache-opennlp-*-incubating-src opennlp
  9. Build Java Archive (JAR) files (5 to 10 minutes, depending):
    cd opennlp/opennlp
    mvn install > build.log
  10. Change to OpenNLP development directory:
    cd $HOME/dev/java/nlp/opennlp/
  11. Move library files to library directory:
    mv opennlp-uima/target/dependency/* ../lib/.
  12. Move training data to data directory:
    mv $HOME/archives/pos/en-pos-maxent.bin $HOME/dev/java/nlp/data/.
  13. Change to development directory:
    cd $HOME/dev/java/nlp/
  14. Copy HelloWorld Source Code to $HOME/dev/java/nlp/
  15. Compile
    javac -cp $(echo lib/*.jar | tr ' ' ':')
  16. Run
    java -cp .:$(echo lib/*.jar | tr ' ' ':') HelloWorld data/en-pos-maxent.bin "Earlier today, we compiled a program."


Earlier => JJR @ 0.2182545923597446
today, => NN @ 0.666361706870189
we => PRP @ 0.8324059729613176
compiled => VBN @ 0.028125261823754893
a => DT @ 0.9145975161653905
program. => NN @ 0.8841759649076423
  • No labels