Obtaining Prebuilt Dictionaries

The dictionaries and models used during annotation indeed are the cornerstone of quality for your results. The install instructions show you how to get the separately-downloadable ctakes-resources archive (which is not itself released by the Apache Software Foundation) that you need to run most of cTAKES. Those resources include:

Building Your Own Dictionaries

The UMLS dictionaries within the ctakes-resources archive might not match your underlying data completely. You might require other local terms, etc. To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Obtaining Models

As of Apache cTAKES 3.1, the models needed to run cTAKES are included with the convenience binaries.

Building your own Models

You may not need to use any models other than those provided with Apache cTAKES, however they have been trained on a specific set of text (a corpus) which might not match the characteristics of your text. If you want to build or train your own models, please read the cTAKES 3.1 Component Use Guide, particularly: