To improve the ability to debug and maintain UIMA components, we propose to add the ability to log the updates to the CAS and the Index Repository as follows:
The collected information can be classified as follows:
The development of this Provenance Tracking of UIMA CAS Content as described in the Wiki is composed of two parts:
The development will be done as a tooling project of Apache UIMA with the participation from the community. Since there are a lot of codes in the CAS Viewer (submitted as a contribution to Apache UIMA) that can be reused, we propose to develop the visualization of CAS/Index Journal as the extension to the CAS Viewer.
In the following proposed GUI mockup, we use deploy/as/MeetingFinderAggregate.xml from uimaj-example of the UIMA AS package with some modification to its behavior to illustrate the design. This aggregate AE has the following structure:
We assume that, after running MeetingFinderAggregate with an input document, the following basic information is produced:
(1) A list of calls to AEs
(2) Within the call to each AE, a list of new FSs created by this AE and a list of modified FSs
(3) Within the call to each AE, a list of add/delete/modify FS operations to index repository
The issue here is how to visualize the above three kinds of basic information to the developers?
Note that, for the initial implementation, we propose to only preserve the final value for a FS (intermediate values are not kept).
Based on the above example and assumptions, the following shows some screen-shots of the proposed GUI used to visualize the journal information.
The information about CAS changes is visualized by the FS Journal tab as shown in Figure 3.1.
The sequence of AE calls is showed in the top section of the tab and is organized as a hierarchy (the key string defined in the aggregate descriptor will be used to identify the AE). The number next to the AE's name is the total number of FSs added or modified by the AE. For example, Meeting AE (2 FSs) means that there are two FSs added or modified by the Meeting AE.
Figure 3.1. For viewing changes to FS in the CAS
Since it is possible to have a long list of FSs (e.g., a few thousands of Token annotations), the list of FSs is compressed within the type name node and the number at the end of the name indicates the total number of added/modified FSs as shown in Figure 3.2.a.
When the type name's nodes are expanded (by clicking on the + sign), the added and modified FSs are revealed as shown in Figure 3.2.b.
We use the , and (~) signs to represent added FS, deleted FS, and modified FS, respectively. Note that, for the FS Journal, we don't have the case of deleted FSs.
Figure 3.2.a. List of type nodes containing changed FSs
Figure 3.2.b. List of Added/Modified FSs Grouped by Type
Check or uncheck the boxes will control the kinds (added or modified) of FSs to be displayed (see Figure 3.3).
Figure 3.3. Selecting Kinds of FS Changes
Check or uncheck the boxes in the tree will trigger the display of the annotationsin the input document section. Operations on the input document section will behave the same way as viewing the normal XMI CAS as described the CAS Viewer's user guide.
Note that it is possible to have FSs that are not a sub-type of Annotationas shown in Figure 3.3 (the uima.tt.Lemmatype is defined as a sub-type of uima.cas.TOP).
Figure 3.4. Non-Annotation Feature Structures
To view the sequence of changes to a FS, select the FS in the tree. A list of changes (added or modified) by the AEs is displayed in the "Change History of FS value" section as shown in Figure 3.5. The displayed information consists of two parts: the last value of the FS and the sequence of changes. The highlight element in the sequence is the FS selected in the tree.
Figure 3.5. Change History of Selected FS - Shows the history for a particular FS, in which component it was created and which subsequent components it was modified.
The information about the changes to index repository is very similar to the changes to the CAS and it is displayed in the Index Journal tab as shown in Figure 3.6. There are two main differences:
Otherwise, the operation of the Index Journal tab is identical to the FS Journal tab.
Figure 3.6. For viewing changes to Index Repository