These models are English to English translation engines trained on the Paraphrase Database. PPDB contains many different types of rules: lexical, phrasal, and syntactic. In addition, confidence measures computed over the entries have been used to filter the dataset at different thresholds, so that you can use only the highest-confidence rules (S) or get all of them (XXXL).

Update (January 3, 2017)

The models have been built and are in the process of being uploaded.


Please see the paper linked below for more information.

  • The "general" model is an untuned model.
  • The "simple" model has been tuned to a text simplification task.
  • The "compress" model has been tuned to shorten texts against the SARI metric.

For instructions on how to use language packs, please see the top-level Language Packs page.


Please cite the following paper if you use these models in your research.

author = {Napoles, Courtney and Callison-Burch, Chris and Post, Matt},
title = {Sentential Paraphrasing as Black-Box Machine Translation},
booktitle = {Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations},
month = {June},
year = {2016},
address = {San Diego, California},
publisher = {Association for Computational Linguistics},
pages = {62--66},
url = {}

Version History

  • Version 1 (June 2016). Runtime: Joshua 6.1 snapshot. 


  • No labels