This page collects value propositions of UIMA, with a focus on describing them from a data centric point of view, for version 3 (or 4 or...)
Keep UIMA worthy of having people invest in it
When users spend time/energy creating reusable components, these components should have long and useful lives. If migration to newer things is needed, migration tools that preserve this investment are valuable.
Driven by the environment in 2015 for big data analytics:
- looking for easy interoperability among languages
- Simpler for simple things, yet supports more complexity for more complex things
OASIS spec is based on XMI serialization. This is not popular in today's world; more popular are JSON like representations.
What UIMA is / is-not trying to do
It can't be all things to all people, without diluting the value it has.
More focus on:
- component reuse, component combinations
- type system merging, and various lenient (de)serializations allow interoperability among different type systems (currently limited to add/remove of types and/or features)
- large scale - includes support for namespaces for type names, and various scale-out capabilities
- significantly complex data representations - inputs and outputs
- single-inheritance type system
- stand-off style of annotations
- references (ability to construct graphs)
- collections (arrays, lists, maybe others)
- standards for data representation in transit, data representation in permanent storage (DBs, etc.) to allow different systems/languages to inter-operate
- interoperability with other frameworks / systems (apps (Lucene), scaleout frameworks (Spark), databases)
Less suited for:
- simple computations
- little reuse
- trivial data representations
Using the web better for data typing
Type systems could be instantiated as web objects, and an ecosystem built around these.
- Imagine a maven-style repository of type systems, with versioning
- going to the web-site could return html docs for the type system
- going to the web-site with a REST API could return the type system metadata
- Could be a strong element for enabling data reuse
Existing Type Systems could be used, e.g. scheme.org