The DBConsumer is an UIMA Annotation Engine that stores CAS annotatations and the XML CAS representation in the database.
The DBConsumer maps UIMA annotations to a relational database using a table per annotation class. Basically, a table exists for each UIMA annotation class. Primitive annotation attributes are mapped directly to table columns. Our strategy for mapping annotations to the database was to perform a 1-to-1 mapping: what you see in the database should correspond exactly to what you see in the UIMA CAS viewer.
Note that you must perform the additional YTEX installation tasks to use this component; this involves setting up a database (MySQL/Oracle/SQL Server).
DBConsumer Component Configuration
Add the DBConsumer to the end of your pipeline, or add it to your CPE descriptor; the annotator configuration file is
The DBConsumer UIMA Annotation Engine accepts the following configuration properties:
- Analysis Batch
Typically, you will want to annotate different document collections, or you may want to annotate the same document collection with different pipelines/configurations. The analysis_batch is a way to identify document annotation runs or document collections. It is stored in thedocument.analysis_batch column.
- Store CAS
Should the UIMA XML representation of document annotations be stored in the database? The gzipped uima xml is stored in the document.cas column. Set to false to speed up the DBConsumer. Use the DBAnnotationViewer to view the CAS directly from the database.
- Store Doc Text
Should the document text be stored in the database? The document text is stored in the document.doc_text column. Set to false to speed up the DBConsumer.
- XMI Output Directory
Directory where UIMA XML representations of document annotations should be stored; if empty they will not be stored in the file system.
- Types to Ignore
UIMA annotations that should not be stored in the database. Add annotations that you are not interested in to this list to speed up the DBConsumer. Take a look at the
ref_uima_typetable for a list of types stored in the database. The class name should give you an idea of what each annotation represents.
- insert Annotation Containment Links
Should anno_contain entries be created? Set to false to speed up the DBConsumer. See
anno_contain(below) for information on what this is.
Using YTEX DBAnnotationViewer
For a graphical representation of document annotations, use the DBAnnotationViewer. This modified viewer retrieves the document CAS from the database (as opposed to the plain-vanilla AnnotationViewer which retrieves the CAS from the file system). To run, open a command prompt/shell, and run the following commands.
cd CTAKES_HOME bin/setenv.bat java -cp lib/*;desc;resources org.apache.ctakes.ytex.tools.DBAnnotationViewerMain
cd CTAKES_HOME . bin/ctakes.profile java -cp lib/*;desc;resources org.apache.ctakes.ytex.tools.DBAnnotationViewerMain
The document table represents a single note/document. The columns are
- document_id - unique generated id
- instance_id - user defined id - i.e. a numeric reference to the document in your system.
- instance_key - user defined id - a string reference to a document in your system. When using a CollectionReader that retrieves files from the file system, this will correspond to the file name.
- analysis_batch - a user-defined 'document group'
- cas - the xml representation of the cas, gzipped
- doc_text - the text of the document
For each document processed, a document row is created.
An anno_base record represents an UIMA Annotation; there is a one-to-many relationship between document and anno_base. The columns are:
- anno_base_id - unique generated id
- document_id - foreign key to document table
- span_begin - corresponds to Annotation.begin attribute
- span_end - Annotation.end attribute
- uima_type_id - Foreign key to ref_uima_type, which contains the fully class name of the Annotation to which this record is mapped.
Annotation subclasses may have additional attributes; these attributes are stored in additional tables prefixed with anno_. E.g. additional attributes of the Sentence annotation are stored in the anno_sentence table. The primary key of these annotation subclass tables corresponds to the primary key of the anno_base table (i.e. it is also a foreign key).
This is mapped to the edu.mayo.bmi.uima.core.type.NumToken, edu.mayo.bmi.uima.core.type.WordToken, andytex.uima.types.WordToken annotations.
- anno_base_id - foreign key to anno_base, also primary key for this table
- tokenNumber - from BaseToken
- normalizedForm - from BaseToken
- partOfSpeech - from BaseToken
- coveredText - the text spanned by this token
- capitalization - 0 - no caps, 1 - 1 cap letter in word, 2 - 2 cap letters in word, 3 - 3 or more cap letters in word
- numPosition - 1st position of number within word
- canonicalForm - uninflected lower case word form, set by LVGAnnotator
- negated - 1 - word is negated, 0 - word is not negated (based on negex)
- possible - 1 - possible (from negex)
This is mapped to the cTAKES Medicationevent annotation.
Feature Structure Tables
In addition to Annotations, UIMA defines FeatureStructs; these are typically not 'free standing' annotations - they usually are 'inside' an Annotation. e.g. the Medicationevent and EntityMention annotations have arrays of OntologyConcepts. FeatureStructs are also mapped toanno_[subclass] tables, e.g. OntologyConcepts are mapped to the anno_ontology_concept table, and have a foreign key to the annotation 'within which' they reside (one-to-many relationship).
This is mapped to the cTAKES OntologyConceptArr of the Medicationevent or EntityMention annotation; these are the concepts (CUIs) of a Named Entity:
- anno_ontology_concept_id - unique system generated id
- anno_base_id - foreign key to named_entity
- code - CUI
- disambiguated - used by SenseDisambiguatorAnnotator. Set to 1 if this concept is the best sense or only sense for the given named entity. Set to 0 (default) otherwise, or if the annotator is not used.
Metamap Candidate annotations are mapped to this table.
Annotation Relationship Modeling
UIMA annotations can also have references to other UIMA annotations, e.g. the TreeBankNode annotation represents a node in a parse tree. This annotation has reference to a parent and children TreeBankNode annotations. Rows in the anno_link represent Annotation links
- anno_link_id: Synthetic primary key
- parent_anno_base_id: foreign key to the anno_base table. This is the parent (source) of the link
- child_anno_base_id: foreign key to the anno_base table. This is the child (target) of the link
- feature: the attribute on the parent object that corresponds to this link
This table represents containment relationships between annotations, e.g. that a word/named entity is contained in a sentence. This has no direct equivalent in any UIMA object; these relationships can be inferred from the begin/end of UIMA annotations, but 'precomputing' these relationships has many practical applications; e.g. it simplfies writing queries of the sort 'give me all named entities in the Impression section'.
- parent_anno_base_id - foreign key to anno_base table. Represents the parent or containing concept.
- parent_uima_type_id - foreign key to ref_uima_type, the class of the parent annotation.
- child_anno_base_id - foreign key to anno_base table. Represents the child or contained concept.
- child_uima_type_id - foreign key to ref_uima_type, the class of the child annotation.
Mapping of Annotations is purely configurative. To map a new annotation do the following:
- Create a table in your database to store the annotation's attributes.
- Tell YTEX to map the annotation class to your table (i.e. add a row to the ref_uima_type table).
To illustrate this, say for example we would like to map your annotation named Foo that has a 'period' has the float attribute period. We would create a table for this annotation, e.g. for mysql:
create table anno_foo (
anno_base_id int not null primary key, /* foreign key to anno_base */
Note: the column names must match the UIMA annotation's attribute names (case insensitive).
And we need to tell YTEX to map Foos to this table:
insert into ref_uima_type (uima_type_id, uima_type_name, table_name)
values (201, 'org.acme.Foo', 'anno_foo');
This table tells YTEX what annotations to map, and the tables to map them to:
- uima_type_id: unique, manually assigned id
- uima_type_name: the fully qualified class name for the annotation
- table_name: the table to which the annotation should be mapped. If null, only a row in the anno_base table will be created for this annotation (many annotations do not have any additional properties).
This is a spring bean configuration file that allows more mapping customization, e.g. mapping attributes to columns with different names.
Below an entity-relationship diagram