This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Skip to end of metadata
Go to start of metadata

This page collects ideas for loading JCas classes including customization(s).

Each UIMA type has feature structure instances represented by instances of a corresponding (usually generated) Java class.  

  • The instances of these classes are the feature structure
    • Except for some built-in values like arrays, UIMA features are stored in either a hidden int[] field or a hidden Object[] field.

Differences with V2

Built-in UIMA Types have fixed (built-in) JCas cover classes

In V2 it was possible (but probably not done) to redefine the built-in JCas cover types for built-in UIMA types.

JCas classes are loaded whether or not the JCas is activated

In V2, the JCas was activated by switching the CAS to the JCas form.  The first time that was done, the JCas classes were loaded.

In V3, because the Feature Structures are kept as instances of these classes, the JCas classes are eagerly loaded when the type system is committed.

Otherwise, a later load might be done for class Foo, but there could be instances of type Foo implemented using Class "TOP" (because class Foo wasn't yet loaded), and attempts to cast this instance to type Foo would fail.

When JCAS Classes are loaded

There are 3 load times for these classes

  1. The Built-in classes are loaded and initialized at the first type system commit.
    1. These are loaded using the ClassLoader used to load the UIMA framework
    2. On subsequent type system commits, the built-in classes are not reloaded.
  2. Other non-Pear JCas classes are loaded at type system commit time.  
    1. The classes may have already been loaded (by other type systems)
    2. The classes are loaded with respect to that type system and a particular extension class loader (if present).  
  3. Pear JCas classes are loaded when a Pear's ClassLoader is inserted into the environment for the first time, for a given type system.
    1. Pear JCas classes are reloaded if the type system changes - because additional types may cause additional JCas classes to be loaded.

Environment Possibilities - Type Systems, Class Loaders, UIMA Instances

Users use UIMA in many different use cases.  Two major classes of use cases are:

  1. A single type system shared by multiple sets of JCas classes for the same type (PEAR class path isolation, and multiple pipelines running with different extension class paths)
  2. A single set of JCas classes, used for multiple type systems (an application where a processing loop includes deserializing the next type system + CAS, and processing it in some way)

This discussion assumes the core UIMA framework was loaded under 1 class loader, and is managing potential multiples of type systems and JCas class sets.  

Of course, it is also possible to have multiple core UIMA frameworks loaded under different class loaders - for example, running multiple frameworks in a web server application.  These kinds of deployments typically isolate each running application from one another.

JCas classes, once loaded under some particular class loader, cannot be unloaded (unless no references exist to them and the class loader used to load it, itself, is GC'd).

Because of this, there are constraints on the use case having multiple type system sharing one set of JCas classes (case 1 above).  If a need exists to get around these constraints, then an approach using additional isolating class loaders must be employed by the user's application.

When a set of JCas classes is loaded, it is required that some associated committed type system be present.  This type system may contain a subset or superset of the types defined in the JCas set.  The type system is used to provide the feature offset constants set as final static values in the JCas loaded class.

It is possible that the class has already been loaded, perhaps using a different associated committed type system.  A check is done to insure that the type system's offset values for features matches what is set as static final values in the loaded class; this is a constraint that must be met when switching type systems while using JCas classes.

classloadcontexts

Design considerations (older thoughts)

  • Built-in types support - would be nice if these were not needlessly replicated - this is done by sharing the built-ins
  • Within a single JVM, there may be multiple UIMA type systems actively in use.  This is most likely supported via a separate class loader for these
    • This implies that the code which uses objects under the class loader be loaded from the same class loader; otherwise they can't "see" the loaded classes.
    • V2 UIMA may (optionally) make use of the UIMAClassLoader with a user-supplied class path, which doesn't delegate to the parent first - it checks its own classpath first,
      • This happens when the UIMA app specifies a UIMA extension class loader (via a not-well-documented method on a ResourceManager instance, setExtensionClassPath)  or a Data Path.
      • It's unlikely that most user code makes use of this.
      • The PEAR classpath isolation makes use of this. 
    • V3 augments the UIMAClassLoader with
      • the capability to recognize and generate JCas classes
        • This requires the loader have an association with one (or more?) particular committed type system instance, in order to do the proper generation.
  • A single type system may be used (via UIMA application APIs) for multiple different pipelines, and for multiple different sets of index definitions.
    • Having a single type system for multiple different sets of index definitions is unlikely
    • The index definition supplies the avoid-index-corruption information needed by the setters
      • If supplied at JCas generation time, then the test can be inserted only where needed, rather than run time testing if it is needed for many types. Test is currently a Bitset lookup.
        • Not done because JCas definitions used for multiple type systems, multiple pipeline definitions
  • Managing multiple type systems actively in use in a single JVM
    • It is possible to do this using Servlet-style complete class loading isolation
      • Done outside of the UIMA framework (but within the single JVM, e.g. Servlet-style class loading)
      • UIMA Impl classes and UIMA pipeline application classes loaded under multiple separate loaders (multiple copies)
      • Works for generated JCas classes, nothing special needed
    • UIMA framework managing multiple type systems
      • UseCase 1: a single pipeline being used, sequentially, for multiple type systems
      • UseCase 2: running multiple pipelines (in parallel) with different type systems, each of which might exhibit UseCase 1.
      • UseCase 1 in v2:  If the JCas isn't being used, then there's no class loading issue. This case can arise when deserialization is being used, and each CAS has its own (potentially different) type system.  The user code might be making use of what it a priori knows to be "common" types and features among the different deserializations.  (For example, all the built-in types are common).
        • The user code could reference non-common features, after using some other value to get the name of the feature.
        • If the JCas is in use, this is supported if the JCas generation was done with the specified feature, and that feature is present.  This requires that the JCas implementation check if the feature being accessed is defined in the current type system; an exception is thrown otherwise.
      • UseCase 1 in v3: 
        • For a given class loader, the generated JCas types cannot be updated.  (Well, there is a non-performant way, via another indirection - some fancy systems use to allow runtime redefinition of classes - see for instance http://zeroturnaround.com/software/jrebel/ or various other methods (google java runtime redefinition classes reload ).
          • Approaches: 
            • The generated types on the first deserialization need to have the Union of all expected types. This might require creation of the Union Type System, via an external utility, and an API for specifying that to UIMA.  This could also cause some inefficiencies, because the Union could grow large.
            • The classloader used needs to be "dropped" and a new one substituted - this will cause regeneration of JCas classes particular to the deserialized type system, but also cause reloading, re-JITting, etc. all the implementation classes of the pipeline.
            • Use "trampoline" FS
      • UseCase2 in v2: If the JCas isn't being used, then there's no class loading issue 
        • except for potential collisions due to same-named annotator / external resource classes with different implementations.
          • Can be avoided by user using UIMA Extension Class Loaders or UIMA DataPaths, per pipeline
        • If the JCas is in use, then
          • If a common classloader is being used, then the JCas definition must be for the Union of the used parts (via JCas) of the type systems.
          • If a separate classloader is being used, then there is no constraint on the JCas definition being used.  
      • UseCase2 in v3:
        • If a common classloader is being used, then the JCas definition must be for the Union of all types/features of the used type systems.
        • If a separate classloader is being used, then there is no constraint, as above. 

Things below are old ideas, not chosen, not done

Alternative to generated JCas classes to avoid classloading issues

We could have a design which has a level of indirection, similar to Nick Hill's submission, but slightly more generalized

  • a ArrayList, indexed by data in the type system, which held references, for values which were of that kind
  • an Int array-list - indexed by data in the type system, which held values of boolean, byte, short, int, long float and double (float & double taking 2 slots).

The arrays would be adjustable, to accommodate different type systems (and perhaps, dynamically augmented type systems).

What's in a JCas cover class?

There are two classes for each type.  

  • x.y.z.Foo - each instance represents one Feature Structure; in v3 these can be GC'd
  • x.y.z.Foo_Type - there is one instance per CAS (arbitrary view)

x.y.z.Foo 

Has 

  • a field for each feature
  • a reference to the _Type instance
    • only for backwards compatibility for low-level access model
    • Multiple instances per type system - one per CAS
    • has ref to TypeImpl
  • a reference to the CAS View (to support addToIndexes for the right view)
  • a ref to a type-system-wide Bitset for index corruption testing
  • Constructors
    • new Foo(Cas)
  • Methods
    • getter / setter for all fields
      • The setter methods may include index corruption checking code.
        • May be code which tests at runtime on each set, whether or not this 
    • indexed getter/setter for fields defined as arrays
    • (via inheritance) 
      • a collection of get/set methods, one per boolean/byte/short.../double/String/TOP/JavaObject and arrays of these, kinds of values.
        • The methods take an extra "offset" value, obtained from the Feature.
        • Used for backwards compatibility with non-JCas styles, and for serialization and other "generic" operations

x.y.z.Foo_Type 

An instance is loaded when a new x.y.z.Foo(some-cas) is done, lazily.  

has

  • a ref to the TypeImpl
  • a ref to the CAS (an arbitrary view, sometimes updated in v2), used for low level access patterns

Instances are accessed per CAS via

  • a Map (kept per CAS) from the x.y.z.Foo Class to the corresponding x.y.z.Foo_type instance.  
    • The key is a x.y.z.Foo Class object, so instances loaded under different class loaders may have the same class name.  This used to happen for PEARS (but not in v3).
      • This happens when different generated x.y.z.Foo (due to different merged type systems) are running in the same JVM.
      • This used to happen within one pipeline with PEAR switching, where the PEAR might have a different customization of a JCas class.  In v3, that doesn't happen; all versions of a customization must be merged.
    • If the Map has no entry, 
      • Load the _Type class itself, if not loaded (Map in TypeSystemImpl instance, key = name string, value = _Type Class). 
      • Make instance of it, populate map in CAS.svd.

Generating and Loading Cas cover class / merging with Customization

loading-jcas-classes

  • No labels