Feature Structures


These are implemented using Java objects, one per FeatureStructure.  They can be Garbage Collected.

There is a generic Java class for these, plus (optional) specific classes for JCas style access. 

v3_FeatureStructure_organization_diagram

APIs for creating Feature Structures, and setting / getting Feature Values in them

There are several kinds of APIs for this.

 

  • Basic: this was the original API, and makes use of UIMA Feature and Type objects as arguments.
  • JCas: this is an API that uses common Java idioms for creating, getting, and setting. 
  • LowLevel: this was like Basic, but substituted an int-valued address for the Java Feature Structure object, and in general, avoided created Java objects.
    • In V3, it is dangerous to create FS using the low level API, because the resulting FS is identified only by an int, and if the Java Garbage Collector runs before any reference is created referring to the newly created FS, it will disappear (due to garbage collection).  So the low level APIs in Version 3 are depreciated.

 

 Descriptioncreate exampleget a valueset a value
Plain

Uses UIMA
Type and Feature
Instances

API: CAS

casView.createFS(aType)

casView.createXXArray(size),
XX was the type. 

fs.getIntValue(aFeature)

fs.get(index) when fs is
one of the built-in arrays 

fs.setFloatValue(aFeature, value)

fs.set(index, value) when fs is
one of the built-in arrays 

JCasFollows Java conventions,
Types and features must
be known at compile time 
new MyType()
  • can have additional constructors 

fs.getMyFeature()

fs.getMyArrayFeature(index)
when the value of myArrayFeature
is
one of the built-in arrays  

fs.get(index) when fs is
one of the built-in arrays 

fs.setMyFeature(value)

fs.setMyArrayFeature(index, value)

fs.set(index, value)

Low
Level 

In version 2 this allowed
CAS access without making
any Java objects; there was
much less "checking" and
it was for high-performance
cases. Feature Structures
were referred to by their
int address in the internal heap.

API: LowLevelCAS 

These had the same name as the
Plain API, except prefixed with
"ll_", e.g.
casView.ll_createFS(aType).

Instead of returning a Java object
representing the FS, these return
ints. 
lowLvlCas.ll_getIntValue(addr, feat)
where the addr and feat are both
ints.

lowLvlCas.ll_setFloatValue(addr, feat)

lowLvlCas.ll_setBooleanArrayValue(addr, index, value)

Getting and setting Feature values in V3

The JCas style of getting / setting feature values requires that the feature names be known at compile time, so you can write getXXXX where XXXX is the known-at-compile-time name of the feature.

The Plain style does not need this information; instead the range must be known, and calls are made like getIntValue(featureValue), where featureValue can be dynamically computed at run time. 

Plain style APIs bypass any JCas getter or setter customization

The plain style APIs do not invoke the JCas style getters and setters, even if those are present and perhaps customized.  This is a design decision made to follow the V2 implementation, and also for performance reasons.  So, if you have customized a getter or a setter in JCas, you must use the JCas APIs to run the customizations.

xxx_Type JCas classes removed in V3

These are eliminated in v3.  They served 2 purposes:

  • save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
  • provided a place for the low level accessors; these are accessors that take the "address" (now "id") of the FS as the way to designate which FS is being used.  There are 2 varieties of these low level accessors - those implemented in the CASImpl, and those implemented in the JCAS Type classes.  The latter has methods like "myShared_TypeInstance.setXXX(address, value)".  These are instance methods on the shared xxx_type instance, and were intended to permit access without creating the Java cover object for the FS.

The performance reason for using the low level accessors is not present in V3; in fact, these, if implemented, would be slower than the other APIs.

JCas Class sharing

JCas classes are associated with a class loader.  Except for the built-in types which always have JCas Classes, other JCas classes are optional. Furthermore, JCas classes may define only a subset of the features of the fully merged type system. So, even when a JCas class is present, it may not have getters and setters for some features of the corresponding UIMA type. These features can be accessed of course using the plain APIs (see above). 

When a UIMA type is instantiated in V3, the Java class used is the most specific instance of a JCas class for that type that is found.  For example, if you have a type Foo, with superType Bar, which in turn is a subtype of Annotation, and have no JCas classes defined, then when you create an instance of Foo (using the plain API: casView.createFS(fooType) because you can't do the JCas style of new Foo, because you haven't got a JCas class for type Foo), it will create an instance of Annotation as the implementing Java class.

One set of JCas classes per class loader may be used (even simultaneously) for multiple different type systems.  This can occur sequentially, for example, in the use case where a sequence of CASs and their type systems are being deserialized and worked on, sequentially; it can also occur when running multiple different pipelines under one class loader. When committing a type system, a check is made for each type to see if there is a corresponding JCas class, and if found, that any defined features have the proper range.

It is possible to run multiple pipelines with non-compatible type systems and JCas classes by running each one under its own class loader; in this scenario, each pipeline will load its own copy of JCas classes from its own classloader's classpath.

JCas Class and UIMA Type conformance

JCas Classes have static final fields computed at load time. Each type system commit loads corresponding JCas classes (the load only happens the first time, per class loader).

A particular type system instance is being committed when a JCas class is loaded.  At load time, these rules are checked:

  • Construct the supertype chain of the class being loaded.  It must be the case that, scanning upwards, there is a supertype that has a corresponding UIMA type.
    • It is OK if there are UIMA types between this and the found corresponding supertype - that just means there were no JCas types defined for those.
    • It is OK for the supertype chain to pass through supertypes which are not UIMA types, as long as the JCas supertypes are abstract (can't be instantiated)
  • For each feature
    • the feature offset assigned to the class's static final value must match the feature offset
    • the feature's range must match
    • JCas-defined features which do not exist in the 1st type system loading this JCas class will result in invalid getters and setters for that feature, if an attempt is made in some code to get/set those features.

How JCas feature offsets are computed or validated at type-system-commit time

The type system is walked in subsumption order, and offsets are assigned to all features.  Then the JCas classes are loaded - the corresponding features are used to set the static final int offset values in the JCas class, if they are actually loaded.  If they are already loaded, the existing values are checked to insure that they match the type system assigned values. A mismatch can occur if multiple different type systems are being used. Mismatches (which cannot happen if only one type system is in use) result in a fatal error.

Connecting Instances with Type and Feature information

Information about types and features is stored in TypeImpl and FeatureImpl instances.  These are unique per type system.  However, multiple type system instances created using the same (merged) definition, and therefore "equal", are recognized at type system commit time, and the existing type system implementation is reused in this case.  This is different from V2, and may require updating code which gets references to types and features prior to type system commit; that code needs to be updated to re-acquire those references after type system commit, because the Type and Feature instances may be replaced with a shared version if the type system is equal to one already committed.

Locating the corresponding UIMA Type when creating a JCas type using the "new" operator

When a JCas instance is created using the "new" operator, it locates the type using information in a JCasRegistry.  The type cannot be statically kept in the JCas class definition, since one JCas class might be used by multiple different type systems.  Instead, each JCas class, when it is loaded, is assigned a unique incrementing number; this number is kept with the static (one per class loader) information for TypeSystemImpl.

At instance creation time, a lookup is done, using the instance of the type system, to get the actual type associated with the registry number.  This mechanism is encapsulated within the JCasRegistry class.

Locating the corresponding UIMA Feature when accessing a feature using JCas APIs

The generated getter or setter code for a JCas feature needs the stored-feature-offset-index information for the feature being accessed.  In the use-case of having multiple type systems for one JCas class set loaded under one class loader, each type system might have a different number for this; this design would make it necessary to have all accesses go thru one level of indirection to get the particular type system's offset for a feature.

This is avoided using the following technique that assigns the offsets to match already assigned ones:

  • The first time a JCas class is loaded at type system commit time, it defines a final static int constant of the pre-computed offset.
  • The 2nd time a JCas class is accessed at type system commit time, the first value stored is read and is used for the offset.

This requires that no JCas class access is done prior to type system commit, since the static final value can only be assigned once at resolution time.  This is normally the case, since it would be invalid to do something with a JCas class before the pipeline is set up.

Collections

UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray. 

UIMA v2 supports arrays of Feature Structures, using FSArray (JCas) or ArrayFS (Generic).  

For v3, support (not yet done, TBD?)

  • new notation (arrays):  aligned with Java: TOP[] or Annotation[] or MyType[] or short[]
  • new notation (collections): aligned with Java generics: List<TOP> or ArrayList<Annotation> or HashSet<MyType>

Use Java fully qualified names as the UIMA type name. 

Extend idea of "component type" to include multiple generics.

  • limit (initially) generic spec to only simple type names, no support for extends, ?, etc.  Use TOP for "Object".

 

  • No labels