Child pages
  • UIMA Requirements
Skip to end of metadata
Go to start of metadata

This page was created to gather UIMA requirements from users. Feel free to add your topics here.

Deployment support for uima-as services and pipelines over clusters for processing large amounts of work

Although we have a deployment descriptor, setting it up and tuning it to potentially varying workloads, optimizing various targets (throughput, latency, recoverability, etc.) is a manual and difficult process.

Improving transparency of UIMA pipeline operations

Currently a lot of statistical information on the operation of UIMA is available, but difficult to access. This could be fixed by developing a "console" kind of application, perhaps like a web-site, with just-in-time tutorial, overview, and drill-down capabilities that would make the operations, bottlenecks, tradeoffs etc., more obvious to interested parties.

UIMA Class Loading Extension

This page discusses a suggestion for adding classpath information to a descriptor.
An alternative might be to use other standard and widely adopted approaches for this; I'm thinking that OSGi provides this capability, along with specifying "versions" and enabling the use of repositories.

General API improvements

Improvements of FSList/FSArray management

  • make it easier to add elements
  • make it easier to iterate FSList

More support for collections of CASs

Additional Class for Collection of CASs called "CCAS"

  • CCAS will have common index for all CASs. There are faster techniques for regular expression
    based annotation on collection of documents using inverted index which can be applied on CCAS.
  • CCAS can have some kind of integration with Hadoop Distributed File System so that it
    is easier to write Map-Reduce task in Hadoop. It can be a way towards integrating UIMA
    with Hadoop.

Supporting more modularity / interoperability

Conforming to widely adopted standards (e.g., OSGi, Maven)

Versioning of Annotators, TypeSystems

Dependency specifications (including versioning)

Packaging of classpath dependencies (already in PEAR, extensions to non-Pear environments)?

Using repositories of artifacts

  • e.g. Maven or P2 repositories
  • If an artifact is referenced via it's "name" and "version", be able to retrieve that from repository if not available locally
  • use maven or maven-like local cache

    Security: signing of artifacts

Efficient CAS persistent store and loading

Currently we can serialize/deserialize CASes in xmi, xcas (old), or binary formats.

  • need to search collections of CASes with various kinds of searches
  • maybe good to persist in relational database or RDF style tables
  • need to load subset of CAS, efficiently (for small subset of large CAS)
  • No labels