The DBConsumer is an UIMA Annotation Engine that stores CAS annotatations and the XML CAS representation in the database.

The DBConsumer maps UIMA annotations to a relational database using a table per annotation class. Basically, a table exists for each UIMA annotation class. Primitive annotation attributes are mapped directly to table columns. Our strategy for mapping annotations to the database was to perform a 1-to-1 mapping: what you see in the database should correspond exactly to what you see in the UIMA CAS viewer.

Note that you must perform the additional YTEX installation tasks to use this component; this involves setting up a database (MySQL/Oracle/SQL Server).

DBConsumer Component Configuration

Add the DBConsumer to the end of your pipeline, or add it to your CPE descriptor; the annotator configuration file is YTEX_HOME\desc\ctakes-ytex-uima\desc\analysis_engine\DBConsumer.xml

The DBConsumer UIMA Annotation Engine accepts the following configuration properties:

Analysis Batch
Typically, you will want to annotate different document collections, or you may want to annotate the same document collection with different pipelines/configurations. The analysis_batch is a way to identify document annotation runs or document collections. It is stored in thedocument.analysis_batch column.

Store CAS
Should the UIMA XML representation of document annotations be stored in the database? The gzipped uima xml is stored in the document.cas column. Set to false to speed up the DBConsumer. Use the DBAnnotationViewer to view the CAS directly from the database.

Store Doc Text
Should the document text be stored in the database? The document text is stored in the document.doc_text column. Set to false to speed up the DBConsumer.

XMI Output Directory
Directory where UIMA XML representations of document annotations should be stored; if empty they will not be stored in the file system.

Types to Ignore
UIMA annotations that should not be stored in the database. Add annotations that you are not interested in to this list to speed up the DBConsumer. Take a look at the ref_uima_type table for a list of types stored in the database. The class name should give you an idea of what each annotation represents.

insert Annotation Containment Links
Should anno_contain entries be created? Set to false to speed up the DBConsumer. See anno_contain (below) for information on what this is.

Using YTEX DBAnnotationViewer

For a graphical representation of document annotations, use the DBAnnotationViewer. This modified viewer retrieves the document CAS from the database (as opposed to the plain-vanilla AnnotationViewer which retrieves the CAS from the file system). To run, open a command prompt/shell, and run the following commands.

Windows:

cd CTAKES_HOME
bin/setenv.bat
java -cp lib/*;desc;resources org.apache.ctakes.ytex.tools.DBAnnotationViewerMain

Linux:

cd CTAKES_HOME
. bin/ctakes.profile
java -cp lib/*;desc;resources org.apache.ctakes.ytex.tools.DBAnnotationViewerMain

Annotation Tables

document

The document table represents a single note/document. The columns are

document_id - unique generated id
instance_id - user defined id - i.e. a numeric reference to the document in your system.
instance_key - user defined id - a string reference to a document in your system. When using a CollectionReader that retrieves files from the file system, this will correspond to the file name.
analysis_batch - a user-defined 'document group'
cas - the xml representation of the cas, gzipped
doc_text - the text of the document

For each document processed, a document row is created.

anno_base

An anno_base record represents an UIMA Annotation; there is a one-to-many relationship between document and anno_base. The columns are:

anno_base_id - unique generated id
document_id - foreign key to document table
span_begin - corresponds to Annotation.begin attribute
span_end - Annotation.end attribute
uima_type_id - Foreign key to ref_uima_type, which contains the fully class name of the Annotation to which this record is mapped.

anno_[subclass]

Annotation subclasses may have additional attributes; these attributes are stored in additional tables prefixed with anno_. E.g. additional attributes of the Sentence annotation are stored in the anno_sentence table. The primary key of these annotation subclass tables corresponds to the primary key of the anno_base table (i.e. it is also a foreign key).

anno_token

This is mapped to the edu.mayo.bmi.uima.core.type.NumToken, edu.mayo.bmi.uima.core.type.WordToken, andytex.uima.types.WordToken annotations.

anno_base_id - foreign key to anno_base, also primary key for this table
tokenNumber - from BaseToken
normalizedForm - from BaseToken
partOfSpeech - from BaseToken
coveredText - the text spanned by this token
capitalization - 0 - no caps, 1 - 1 cap letter in word, 2 - 2 cap letters in word, 3 - 3 or more cap letters in word
numPosition - 1st position of number within word
canonicalForm - uninflected lower case word form, set by LVGAnnotator
negated - 1 - word is negated, 0 - word is not negated (based on negex)
possible - 1 - possible (from negex)

anno_med_event

This is mapped to the cTAKES Medicationevent annotation.

Feature Structure Tables

In addition to Annotations, UIMA defines FeatureStructs; these are typically not 'free standing' annotations - they usually are 'inside' an Annotation. e.g. the Medicationevent and EntityMention annotations have arrays of OntologyConcepts. FeatureStructs are also mapped toanno_[subclass] tables, e.g. OntologyConcepts are mapped to the anno_ontology_concept table, and have a foreign key to the annotation 'within which' they reside (one-to-many relationship).

anno_ontology_concept

This is mapped to the cTAKES OntologyConceptArr of the Medicationevent or EntityMention annotation; these are the concepts (CUIs) of a Named Entity:

anno_ontology_concept_id - unique system generated id
anno_base_id - foreign key to named_entity
code - CUI
disambiguated - used by SenseDisambiguatorAnnotator. Set to 1 if this concept is the best sense or only sense for the given named entity. Set to 0 (default) otherwise, or if the annotator is not used.

anno_mm_candidate

Metamap Candidate annotations are mapped to this table.

Annotation Relationship Modeling

UIMA annotations can also have references to other UIMA annotations, e.g. the TreeBankNode annotation represents a node in a parse tree. This annotation has reference to a parent and children TreeBankNode annotations. Rows in the anno_link represent Annotation links

anno_link_id: Synthetic primary key
parent_anno_base_id: foreign key to the anno_base table. This is the parent (source) of the link
child_anno_base_id: foreign key to the anno_base table. This is the child (target) of the link
feature: the attribute on the parent object that corresponds to this link

anno_contain

This table represents containment relationships between annotations, e.g. that a word/named entity is contained in a sentence. This has no direct equivalent in any UIMA object; these relationships can be inferred from the begin/end of UIMA annotations, but 'precomputing' these relationships has many practical applications; e.g. it simplfies writing queries of the sort 'give me all named entities in the Impression section'.

parent_anno_base_id - foreign key to anno_base table. Represents the parent or containing concept.
parent_uima_type_id - foreign key to ref_uima_type, the class of the parent annotation.
child_anno_base_id - foreign key to anno_base table. Represents the child or contained concept.
child_uima_type_id - foreign key to ref_uima_type, the class of the child annotation.

Mapping Configuration

Mapping of Annotations is purely configurative. To map a new annotation do the following:

Create a table in your database to store the annotation's attributes.
Tell YTEX to map the annotation class to your table (i.e. add a row to the ref_uima_type table).

To illustrate this, say for example we would like to map your annotation named Foo that has a 'period' has the float attribute period. We would create a table for this annotation, e.g. for mysql:

create table anno_foo (
  anno_base_id int not null primary key, /* foreign key to anno_base */
  period double
) engine=myisam;

Note: the column names must match the UIMA annotation's attribute names (case insensitive).

And we need to tell YTEX to map Foos to this table:

insert into ref_uima_type (uima_type_id, uima_type_name, table_name)
values (201, 'org.acme.Foo', 'anno_foo');

ref_uima_type

This table tells YTEX what annotations to map, and the tables to map them to:

uima_type_id: unique, manually assigned id
uima_type_name: the fully qualified class name for the annotation
table_name: the table to which the annotation should be mapped. If null, only a row in the anno_base table will be created for this annotation (many annotations do not have any additional properties).

config/desc/ytex/beans-uima-mapper.xml

This is a spring bean configuration file that allows more mapping customization, e.g. mapping attributes to columns with different names.

Data Model

Below an entity-relationship diagram

Space shortcuts

Child pages

DBConsumer Component Configuration

Using YTEX DBAnnotationViewer

Annotation Tables

document

anno_base

anno_[subclass]

anno_token

anno_med_event

Feature Structure Tables

anno_ontology_concept

anno_mm_candidate

Annotation Relationship Modeling

anno_contain

Mapping Configuration

ref_uima_type

config/desc/ytex/beans-uima-mapper.xml

Data Model

Space shortcuts

Child pages

cTAKES 3.1.2 - YTEX DBConsumer

DBConsumer Component Configuration

Using YTEX DBAnnotationViewer

Annotation Tables

document

anno_base

anno_[subclass]

anno_token

anno_med_event

Feature Structure Tables

anno_ontology_concept

anno_mm_candidate

Annotation Relationship Modeling

anno_contain

Mapping Configuration

ref_uima_type

config/desc/ytex/beans-uima-mapper.xml

Data Model