Ideas for UIMAJ v3

A place to collect ideas for the next version of UiMA Java core.

Big changes

More use of Java compiler (ecj) and decompiling

A portable Java compiler from Eclipse (ecj) and decompiling capabilities (e.g. Procyon) are appropriately licensed and could be part of the startup.

JCasGen could be "automatic" for merged type systems, and merged instances of JCasGen'd user classes?
- Users still would need a generated version for their code to compile against.
Pear definitions for JCas cover classes could be merged?
Could generate one kind of Java cover class for all types. (lazy, load on demand
- eliminate / reduce use of TypeImpl in runtime.
- generate for all merged types (except custom built ins)
  - (as opposed to current impl, where no JCas cover class is generated if it doesn't exist - the "standard" one is used instead)
use class loader technology to support multiple type systems
- Having same-named types, sharing the JCas cover types for those, but (after merging) having different sets of features.
- This would only be used for UIMA (merged) Types that have same name but have different feature sets.
- Current design uses the same JCas cover class for differing type systems (e.g., ones that have a different # of features for a type). In this case, the JCas cover type only is being used to set/read slots it knows about; other facilities might be used to read/set additional slots.

Feature Structure == an instance of its Java Cover class

One representation only of a FS; the static fields of the class have the typeImpl info..

Features represented directly as fields.

To get around "reflection" slowness:
- Support set/get by int <- class <- feature-name-string
- Support set/get (bulk) ? <ordering among fields significant?>

User customization of Java cover classes, and PEAR classpath isolation issues

Currently users may customize their JCas cover classes. PEAR classpath isolation allows the use case where different customizations are present in one pipeline. The current implementation supports this, and switches the set of JCas cover classes as Pear boundaries are crossed. The idea of a Feature Structure being an instance of its cover class breaks down when multiple definitions of this exist. Some ideas for fixing this.

Do some kind of "merge" operation among all definitions of JCas cover classes including those in contained PEARs, and use that one merged definition everywhere.
- Advantages: is most similar to what we have now
- Disadvantages: it's not always possible to find a merge that preserves all the original implementations. It might be very difficult to construct an appropriate merge algorithm, given the arbitrariness of the custom code.
Split apart the system-generated (from the merged type system) JCas cover class and user customization, into different classes. The user customization class would wrap the system-generated one, and create both; all value setting/getting would be via forwarding methods.
- Advantages:
  - No merging logic is needed; it would allow dropping the merge facility (which is old, doesn't support Java 1.5 or later, etc.)
  - The system could generate from the merged type system a cover class that supported all the fields, making full use of the type hierarchy. There would be no need to have external processes or procedures to insure that the cover class generated had the fully merged type system; this would be automatic. Projects could run JCasGen to get prototype cover classes - these would not be loaded but would serve to provide classes to have code compile against.
  - Better management of customization vs system code due to their separation.
- Disadvantages:
  - This approach seems to break the type inheritance model (a custom class wrapping the system-generated one would not be in a Java type hierarchy. The normal way around this is to have "interfaces" in the hierarchy. However interfaces can be created with "new ...". I suppose we could change things to not rely on "new" (e.g., have a create() operator). But that would be a big change.
  - It would require some kind of a migration utility, because this is not how users customized the generated classes.
  - It would end up with one more re-direction for get/set operations (due to the wrapper), for customized classes. If no customization was needed, the generated class would be named with the official name and serve all uses of it.
- Nesting: An "outermost" pipeline can nest 1 or more PEARs, which, in turn may nest one or more inner PEARs, etc. (Type merging is applied to all the type definitions, including those in the PEAR). Each inner PEAR JCas customization would be a 1-level wrapper of the system-generated class from the outer-most pipeline (not on its container if it was a PEAR).
- Naming: JCas cover classes are named to match their UIMA type name. This enables users to write "new MyType()" where MyType is the UIMA type name.
  - If a JCas cover class is not customized (anywhere in the pipeline, including inside PEAR files), we have the system generated class, and its expected name as it is now.
  - If it is customized, the custom "wrapper" would carry the official name, so users would use it, and the system-generated class would need a new name (e.g. xxxx_UIMA_JCas_Generated.), which would "hide it" from normal access. A complete analysis of the pipeline running as an application in one JVM would be needed to find (including inside PEARs) which UIMA types had customization (anywhere, including even if in just one PEAR). Those types would need the alternate naming protocol.
Another way to split apart the system-generated from the customization: have the customization "extend" rather than wrap the system-generated one.
- Advantages
  - Type hierarchy / inheritance works (sort of). The sort of part: if you extend a class to cusotmize it, and some classes in the parent chain are also extended, your extended class misses those customizations. (This is avoided if you instead "merge").
- Disadvantages

More concurrency

Support parallel running of pipeline components.

Careful trade-off vs slower due to synchronization, cache-line interference. Key is to separate things being updated.

Consider special index support for this

Supporting Java 8 streams

Iterating over FSs: alternative: have generator of FSs, process with stream APIs

Possibly having a new kind of managed component? being either
- The "functions" the standard operations on streams use
- new standard operations on streams (unlikely I think)
- I think this might be deferred until we have some more experience

(Unlikely) Making the element of the "stream" be a new CAS - replacement for CAS Multipliers. Seems like the wrong granularity... Maybe best to let Java evolve this for a few more releases.

Child pages