This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • UIMAv3DataCentricValuePropositions
Skip to end of metadata
Go to start of metadata

This page collects value propositions of UIMA, with a focus on describing them from a data centric point of view, for version 3 (or 4 or...)

Keep UIMA worthy of having people invest in it

When users spend time/energy creating reusable components, these components should have long and useful lives.  If migration to newer things is needed, migration tools that preserve this investment are valuable.

Data focus

Driven by the environment in 2015 for big data analytics:

  • multiple languages, including Scala, Python, Go, Ruby, JavaScript 
  • looking for easy interoperability among languages
  • Simpler for simple things, yet supports more complexity for more complex things

OASIS spec is based on XMI serialization.  This is not popular in today's world; more popular are JSON like representations.

What UIMA is / is-not trying to do

It can't be all things to all people, without diluting the value it has.

More focus on:

  • component reuse, component combinations
    • type system merging, and various lenient (de)serializations allow interoperability among different type systems (currently limited to add/remove of types and/or features)
  • large scale - includes support for namespaces for type names, and various scale-out capabilities
  • significantly complex data representations - inputs and outputs
    • single-inheritance type system
    • stand-off style of annotations
    • references (ability to construct graphs)
    • collections (arrays, lists, maybe others)
  • standards for data representation in transit, data representation in permanent storage (DBs, etc.) to allow different systems/languages to inter-operate
  • interoperability with other frameworks / systems (apps (Lucene), scaleout frameworks (Spark), databases)

Less suited for:

  • simple computations
  • little reuse
  • trivial data representations

Using the web better for data typing

Type systems could be instantiated as web objects, and an ecosystem built around these.

  • Imagine a maven-style repository of type systems, with versioning
  • going to the web-site could return html docs for the type system
  • going to the web-site with a REST API could return the type system metadata
  • Could be a strong element for enabling data reuse

Existing Type Systems could be used, e.g.


  • No labels