This page was created to gather UIMA requirements from users. Feel free to add your topics here.
Deployment support for uima-as services and pipelines over clusters for processing large amounts of work
Although we have a deployment descriptor, setting it up and tuning it to potentially varying workloads, optimizing various targets (throughput, latency, recoverability, etc.) is a manual and difficult process.
Improving transparency of UIMA pipeline operations
Currently a lot of statistical information on the operation of UIMA is available, but difficult to access. This could be fixed by developing a "console" kind of application, perhaps like a web-site, with just-in-time tutorial, overview, and drill-down capabilities that would make the operations, bottlenecks, tradeoffs etc., more obvious to interested parties.
This page discusses a suggestion for adding classpath information to a descriptor.
An alternative might be to use other standard and widely adopted approaches for this; I'm thinking that OSGi provides this capability, along with specifying "versions" and enabling the use of repositories.
General API improvements
Improvements of FSList/FSArray management
- make it easier to add elements
- make it easier to iterate FSList
More support for collections of CASs
Additional Class for Collection of CASs called "CCAS"
- CCAS will have common index for all CASs. There are faster techniques for regular expression
based annotation on collection of documents using inverted index which can be applied on CCAS.
- CCAS can have some kind of integration with Hadoop Distributed File System so that it
is easier to write Map-Reduce task in Hadoop. It can be a way towards integrating UIMA
Supporting more modularity / interoperability
Conforming to widely adopted standards (e.g., OSGi, Maven)
Versioning of Annotators, TypeSystems
Dependency specifications (including versioning)
Packaging of classpath dependencies (already in PEAR, extensions to non-Pear environments)?
Using repositories of artifacts
- e.g. Maven or P2 repositories
- If an artifact is referenced via it's "name" and "version", be able to retrieve that from repository if not available locally
- use maven or maven-like local cache
Security: signing of artifacts
Efficient CAS persistent store and loading
Currently we can serialize/deserialize CASes in xmi, xcas (old), or binary formats.
- need to search collections of CASes with various kinds of searches
- maybe good to persist in relational database or RDF style tables
- need to load subset of CAS, efficiently (for small subset of large CAS)