Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Joshua implements a subset of the Google Translate API. This is used in the web-based demo (included with the language packs and the source distribution), and should make it easy to communicate with it. Specifying, queries to Joshua should be prefixed with "http://SERVER:PORT/translate?QUERY_STRING". SERVER and PORT should be self-explanatory. The QUERY_STRING is a set of key=value pairs, separated by & characters. The following keys are recognized:

  • q=. One or more translation queries, each of which should be a sentence. We suggest batching up to 100 translation requests per query.
  • meta=META_COMMAND

The META_COMMAND string allows you to set parameters of the server. Currently, the following are supported:

  • get_weight=PARAM. Returns the weight of the requested parameter in the "meta" field of the JSON response.
  • set_weights=KEY+VALUE+KEY+VALUE.... Sets the model weights of each KEY to the following VALUE.
  • get_weights. Returns a list of all feature names and their corresponding model weights.
  • add_rule=LHS+|||+SOURCE+|||+TARGET+|||+FEATURES+|||+ALIGNMENT. Adds a rule to the custom grammar following the format of rules in the grammar.
  • list_rules. Returns a list of rules in the custom grammar.
  • remove_rule=RULE. Removes the custom rule (if present), where RULE is formatted as described above for add_rule.

In response to the URL string, a JSON-object is returned.

A script in the source distribution, $JOSHUA/scripts/support/query_http.py, is provided for querying the Joshua server with a pre-tokenized test set. For more information, you might also find the demo useful, since it supports all of the above features. The demo can be found in each of the language packs or directly in the source distribution.

Example

Here is an example web query for obtaining translations. Assuming the Spanish–English LP is running on localhost port 5674, issuing:

Code Block
languagebash
$ curl "localhost:5674/translate?meta=list_weights&q=cifra+inferior+a+lo+que+predec%C3%ADan+las+encuestas+%2C+que+pronosticaban+de+mas+del+60+%25+de+participaci%C3%B3n+electoral+.&q=yo+quiero+taco+bell"

will yield the following response:

Code Block
languagebash
{
  "data": {
    "translations": [
      {
        "translatedText": "Figure less than what the polls predicted, claiming more than 60 % of electoral participation.",
        "raw_nbest": [
          {
            "hyp": "figure less than what the polls predicted , claiming more than 60 % of electoral participation .",
            "totalScore": -8.429729
          }
        ]
      },
      {
        "translatedText": "I want taco bell",
        "raw_nbest": [
          {
            "hyp": "i want taco bell",
            "totalScore": -3.8622975
          }
        ]
      }
    ]
  },
  "metadata": [
    "weights tm_custom_0\u003d0.000 tm_pt_0\u003d0.004 tm_pt_1\u003d0.029 tm_pt_2\u003d0.002 tm_pt_3\u003d0.325 tm_pt_4\u003d0.106 tm_pt_5\u003d0.087 OOVPenalty\u003d0.006 WordPenalty\u003d-0.090 lm_0\u003d0.221 Distortion\u003d0.094 PhrasePenalty\u003d-0.002 lm_1\u003d0.034"
}

Performance

We tested the Joshua RESTful server (-server-type http), comparing it to the raw TCP/IP server (-server-type tcp). On an Intel machine with 6 Intel 2 GHz processors, we started each of the servers (one at a time) with six threads (-threads 6). We then simultaneously requested each server in turn to translate the same 16k+ test set five times. The following commands were used:

Scenario 1: TCP/IP server

Code Block
languagebash
# Start Joshua
$ apache-joshua-es-en-2017-03-03/joshua -server-type tcp -server-port 5674 -v 0 &
# Query it
$ for x in 1 2 3 4; do 
> for num in $(seq 1 5); do 
> cat corpus.es | nc localhost 5674 > /dev/null 2>&1 & 
> done 
> time wait 
> done

The first run was a burner. The run times of the last three runs were 8:07, 8:06, and 8:06 (MM:SS).

Scenario 2: RESTful HTTP server

Code Block
languagebash
# Start Joshua
$ apache-joshua-es-en-2017-03-03/joshua -server-type http -server-port 5674 -v 0 &
# Query it
$ for x in 1 2 3 4; do 
> for num in $(seq 1 5); do 
> /home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p 5674 corpus.es > /dev/null 2>&1 & 
> done 
> time wait 
> done 

The run times were 7:25, 7:34, and 7:20, about 5% faster than the TCP/IP version.

Future Plans

Currently, Joshua itself supports the RESTful API. In the future, we plan to extend the Joshua Translation Server to more fully support the Google Translate API, particularly:

  • Allowing the specification of the source language and forwarding requests to the appropriate back-end server
  • Automating the above with automatic language ID (LID)
  • Supporting authentication (so that publicly-accessible servers are not providing free translations)

...