Apache Joshua Language Packs

Machine translation is the task of getting a computer to automatically translate sentences between human languages (for example, from Chinese to English, or from English to Uighur). It has been the subject of intense research interest over the past twenty years with the advent of statistical translation technologies, and has been usable for some time. Academic researchers have been working on open-source translation systems since the debut of Moses in 2006 (Joshua came around in 2009), and the popular Google Translate tool debuted in 2006, as well.

Yet problems remain with these systems. Even with the ready availability of quality open-source tools and good documentation, extensive technical knowledge is required to build good translation systems. Google Translate is a great tool for translating a web page, document, or a handful of sentences, but use beyond that requires access to a paid API. There is no easy solution for people who don't care how machine translation works, but simply want to use it as a black-box system inside of a larger project and running on their own hardware.

The Apache Joshua project aims to solve this problem with its recent release of 62 Language Packs. Language packs are pre-built translation systems provided as a single compressed file that run with only a single external dependency: Java 8. There is no need to compile a long list of arcane C++ programs, or to wade through documents describing how to train systems, or shell out money to an external company. These language packs can run on a single modern desktop computer with as little as 4 GB of RAM for some language pairs, though they may require up to 16 GB of RAM for others.

These are early release models. The resource requirements are higher than we would like, and the quality will not be as high as what you will get if you build your own models. However, we are releasing these now (under Apache 2.0 licenses) with the hopes that they will be useful to many people, and with plans to improve them in the near future, with the latest from academic and industrial research, and (we hope) feedback from users.

You can explore and download the language packs here:

If you have comments or questions, please post them to our user support mailing list:

Documentation for the Apache Joshua machine translation system is available at