These pages are intended for users who wish to use work on the Apache Joshua codebase, including the decoder and its associated tools, or who wish to use it to build their own models. If you simply wish to use Joshua as a black-box machine translation system, and aren't interested in how it works or in making it better, then you are probably more interested in downloading one of the many available language packs.
Setting up your environment requires a few steps:
Clone Joshua from its git repository.
Set the Joshua root dir, then compile Joshua and install dependencies.
The "download-deps.sh" will download and compile utilities that are used by Joshua. These include the alignment tool GIZA++ and the symal tool, the Berkeley aligner and the KenLM runtime library, and Thrax, Joshua's grammar extractor.
If you have problems, especially with the installation of dependencies, please see the Support pages.
Joshua's hierarchical and phrase-based model extractor, Thrax, requires Hadoop (the 2.5+ series). It should suffice to ensure that the "hadoop" command is in your path. If you don't have a Hadoop installation, it is not too difficult to setup a cluster in standalone mode, which will allow you to build small models and run through the tutorial.
KenLM is used to build new language models from raw text (lmplz), to compile ARPA-style language models to its binarized format (build_binary), and to load and query the language model while translating (libken). KenLM isn't required, but is a good tool and the default for some of these tasks. Running the "download-deps.sh" scripts should compile it from you, but KenLM has some dependencies, most notably Boost, and these can present problems.
KenLM is built with the script jni/build_kenlm.sh. If that fails, the most likely problem is that you don't have a proper development environment, or that Boost was not in your LD_LIBRARY_PATH.