Skip to end of metadata
Go to start of metadata

These pages are intended for users who wish to use work on the Apache Joshua codebase, including the decoder and its associated tools, or who wish to use it to build their own models. If you simply wish to use Joshua as a black-box machine translation system, and aren't interested in how it works or in making it better, then you are probably more interested in downloading one of the many available language packs.

Setting up your environment requires a few steps:

Install Joshua

  1. Clone Joshua from its git repository.

  2. Set the Joshua root dir, then compile Joshua and install dependencies.

    The "download-deps.sh" will download and compile utilities that are used by Joshua. These include the alignment tool GIZA++ and the symal tool, the Berkeley aligner and the KenLM runtime library, and Thrax, Joshua's grammar extractor.

Assuming everything compiled correctly, you are all set and you are probably ready to progress to the Joshua Tutorial. If you wish to help develop Joshua, see the Development page.

If you have problems, especially with the installation of dependencies, please see the Support pages.

Dependencies

Hadoop

Joshua's hierarchical and phrase-based model extractor, Thrax, requires Hadoop (the 2.5+ series). It should suffice to ensure that the "hadoop" command is in your path. If you don't have a Hadoop installation, it is not too difficult to setup a cluster in standalone mode, which will allow you to build small models and run through the tutorial

KenLM

KenLM is used to build new language models from raw text (lmplz), to compile ARPA-style language models to its binarized format (build_binary), and to load and query the language model while translating (libken). KenLM isn't required, but is a good tool and the default for some of these tasks. Running the "download-deps.sh" scripts should compile it from you, but KenLM has some dependencies, most notably Boost, and these can present problems. 

KenLM is built with the script jni/build_kenlm.sh. If that fails, the most likely problem is that you don't have a proper development environment, or that Boost was not in your LD_LIBRARY_PATH.

  • No labels