Obtaining Prebuilt Dictionaries
The install instructions show you how to get the separately-downloadable ctakesresources archive (which is not itself released by the Apache Software Foundation) that you need to run most of cTAKES.
Building Your Own Dictionaries
The UMLS dictionaries within the ctakes-resources archive might not match your underlying data completely. You might require other local terms, etc. To create customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, you may use one of the dictionary tools that can be found in the cTAKES Dictionary Creator GUI componentpage.
The models needed to run cTAKES are included with the convenience binaries.
- Training a sentence detector model
- Training a Part of Speech (POS) tagger model: Building a model - Obtaining training data
- Training a chunker model: Building a model - Prepare GENIA training data
- Training a dependency parser: Training a model - Training data or Training a model in Eclipse