{scrollbar} 65%Contents of this Page2 Menu cTAKES 3.0 to Include

Overview of Drug Named Entity Recognition (optional)

The Drug NER (Drug Named Entity Recognition), also referred to as Medication Annotator, processes flat files or CDA (plain text wrapped with Clinical Document Architecture) documents to identify drug NEs and related attributes such as dosage, strength, route, etc. The annotator extracts data from both lists as well as narrative text.

For additional documentation pertaining to this pipeline see <cTAKES_HOME>/Drug NER/README.

Analysis engines (annotators)


The file cTAKESdesc/drugnerdesc/analysis_engine/DrugAggregateCDAProcessor.xml provides a working example of the Medication Annotator. This aggregate includes DrugLookupWindow, DrugMention Annotator and various annotators form cTAKES release all of whom can be found in projects at <cTAKES_HOME>/.

  • DrugMentionAnnotator
  • DrugLookupWindowAnnotator

DrugAggregateCDAProcessor.xml is also provided to process CDA documents. The aggregate flow will contain the annotator version CdaCasInitializer.xml which will process the document as a Clinic Document Architecture (CDA) wrapped file. Additionally, the Sofa Mappings are enabled for the plaintext output view, which is intended to handle mapping the DTD properties to properties used by the pipeline (e.g. Patient and date meta-data).


This annotator is similar to cTAKESdesc/cdpdesc/analysis_engine/LookupWindowAnnotator.xml with customizations. The original LookupWindowAnnotator is an Aggregate which includes NP2LookupWindow and MaxLookupWindows annotators. DrugLookupWindow aggregate adds DrugCNP2LookupWindow annotator to the original set of annotators in the flow.

srcDrugObjClass <String/Single-valued/Required>
(Default Value = 'edu.mayo.bmi.uima.chunker.type.NP')
Identifies the Chunk type that needs to be used to generate

destDrugObjClass <String/Single-valued/Required>
(Default Value = 'edu.mayo.bmi.uima.lookup.type.DrugLookupWindowAnnotation')
Identifies the destination type that the Chunk type defined by srcDrugObjClass is the source of.

dataDrugBindMap <String/Multi-valued/Required>
(Default Values = 'getBegin|setBegin, getEnd|setEnd')
Binds data from source to destination.

sectionOverrideSet <String/Multi-valued/Optional>
(Default Values = 'getBegin|setBegin, getEnd|setEnd')
Identifies the sections which as a whole should be treated as a lookup window.


This annotator generates new DrugLookupWindow annotations for the sections whose seaction ids are specified in the parameter sectionOverrideSet. The default for out of the box configuration does not contain any section ids specified. Please read <cTAKES_HOME>/drugner/README for more information on recommended usage.


This descriptor is similar to the one in cTAKESdesc/lookup/analysis_engine. Refer to Dictionary Lookup


This annotator adds the ability to identify attributes of drug mentions such as Dosage, Frequency, Frequency Unit, Route and Strength from either plaintext or CDA documents. It also provides the ability to specify which sections of a note contain drugs in a list format versus drug mentions within the narrative of the note. This allows for customized processing done on different sections and generally improves the quality of the annotations. This project utilizes various cTAKES components and hence requires cTAKES to be installed prior to using this component.

medicationRelatedSection <String/Single-valued/Optional>
(Defaule Value = 'SIMPLE_SEGMENT')
IDs of sections generated by your Segment Annotator where drug mentions appear in a list format.


This descriptor is similar to the one with the same name in cTAKESdesc/necontextdesc/analysis_engine. Refer to NE Contexts for a description.


This descriptor is similar to the one with the same name in cTAKESdesc/necontextdesc/analysis_engine. Refer to NE Contexts for a description.


The file cTAKESdesc/drugnerdesc/collection_processing_engine/DrugNER_PlainText_CPE.xml provides an XML-specification of a collection processing engine (CPE).

To run the CPE

  • Start UIMA CPE GUI.

java -cp <classpath> org.apache.uima.tools.cpm.CpmFrame

  • Open this file.
  • Set the parameters for the collection reader to point to a local collection of files that you want part-of-speech tagged.
  • Set the parameters for the DrugMentionAnnotator as appropriate for your environment.
  • Set the output directory of the XCAS Writer CAS Consumer.

The results of running the pipeline are written to the output directory as XCAS files. These files can be viewed in the CAS Visual Debugger.
A sample plian text document has been provided for convenience that can be used as input document for the process described above.

The steps described under DrugNER_PlainText_CPE.xml can be used to process the sample document provided to validate Drug NER pipeline.

  • No labels

1 Comment

  1. In cTAKES 3.2, there is only DrugMentionAnnotator. Perhaps we should create a new DrugNER page for 3.2 and link to it from the 3.2 Component Use Guide.