A place to collect ideas and observations about a re-write of the internals of UIMA to modernize it in several dimension.

These include

A prototype in this direction, called Cas-obj, has been offered in UIMA-4329

Ideas for the next major version of UIMA

Here's a wiki page to collect more ideas for what might be some things to consider for UIMA version 3.

Edge Cases affecting internal design

Merged Type Systems, running different type systems in 1 JVM, sharing JCas cover objects

A key aspect of UIMA is the type system merging (among all annotators in a pipeline) that occurs at the beginning of a "run". After the merge is complete, the type system is "locked down", and various optimizations are possible based on this.

The design should support the use case: 1 JVM running multiple different pipelines together.  So, in particular, there can be multiple TypeSystems in use at once.

The design should support the use case: 1 JVM loading 1 definition of JCas cover objects, but running different underlying typesystems sharing the JCas cover objects.  This implies that part of the instantiation of the JCas Cover object is variable, based on the type system, and the particular location (or number of) features in a type a JCas is "covering" might be different in the underlying type systems.  

These use cases give rise to a design with some "indirection" to support the multiplicity of values corresponding to multiple type systems in the same JVM (with possibly the same JCas cover objects).

Pear classpath isolation

This allows Pears to run with different JCas cover objects, for the same underlying shared UIMA Type.

Runtime checks

Runtime checks can slow down normal operations, but this can be minimized via a design which only references L1/cache data.

Frequencies

areasub-areafrequencydetails
FS creation

AnnotationBase subtype not allowed in base View

  
FS slot setting

check for index corruption

  • see if FS field is one which is in 1 or more indexes, and if so,
  • see if this FS is in any index in any view
    • (currently an expensive operation, could be made a lot cheaper with 1 boolean per FS per view - the value could be indexed in one view and not in another)