This extension to the CAS interface provides APIs to enable
tracking of CAS operations by component and APIs to access
the logged information.

When journaling is enabled, the following information related to CAS
operations will be logged:
1) Calling sequence of component As.
2) For each AE, a list of newly created feature structures (FSs) and a list of changes to pre-existing FSs.
3) For each AE, a list of added, deleted, and reindexed FSs in each index repository.

The Journal class obtained from the CAS provides accessed to the above information.
Although this initial proposal is based on the requirements for provenance tracking
as described here, the APIs are intended to be general and support any application that wants
to visualize or track CAS operations.

The proposed extension to the CAS interface and the new Journal interface are shown below:

Issues that needs discussion:

1) use of FS ids as a handle to FeatureStructure objects.
The proposed APIs return arrays of FS ids.  Currently there are only LowLevelCAS APIs to
get a FeatureStructure object from a FS id.

 2) should Journaling be enabled via a global setting ?

CAS Interface extension

/**
   * Gets an array of <code>Journal</code> objects one for each component that processed this
   * CAS.
   * The <code>Journal</code> object provides access to the  CAS and IndexRepository
   * updates made by a specific analysis component.
   *
   * @return Journal object
   */
  Journal[] getJournal();

  /**
   * Enable or disable journaling.
   *
   * Note: this method may only be called from an application. Calling it from an annotator
   * will trigger a runtime exception.   *
   * @param enable
   *
   */
  void enableJournaling(boolean enable);
Journal Interface   /**
* An instance of this class provides access to CAS and IndexRepository updates
* made by a specific analysis component.
*
*/
public interface Journal {

  /**
   * Gets the delegate key name assigned to the component. If this component is
   * not a delegate, the component name will be "TopLevel".
   * Note that when this info is returned from a service, TopLevel could be
   * replaced by the delegate name of this component in the client aggregate.
   * @return
   */
  String getComponentName();

  /**
   * Gets the call path to the component. This is the fully-qualified name obtained by calling
   * <code>UimaContextAdmin.getQualifiedContextName(</code> for this component. This is a
   * slash-separated name consisting of each containing context name back to the root.
   * For example, the context name for an annotator nested within two AnalysisEngines might look like:
   * <code>/MyTopLevelAnalysisEngine/MyComponentAnalysisEngine/MyAnnotator/</code>.
   * @return
   */
  String getComponentPath();

  /**
   * Returns an array of of ids representing new FSs added by this component
   * @return
   */
  int[] getNewFS();

  /**
   * Return an array of ids representing FSs whose feature values were modified
   * by this component.
   * @return
   */
  int[] geModifiedFS();

  /**
   * Returns an array of ids representing FSs added to the specified view.
   * If viewname is null, returns ids of FSs added to the default view.
   * @return
   */
  int[] getIndexAdd(String viewname);

  /**
   * Returns an array of ids representing FSs that were reindexed in the specified view.
   * If viewname is null, returns ids of FSs added to the default view.
   * @return
   */
  int[] getIndexModify(String viewname);

  /**
   * Returns a list of ids representing FSs that were removed from the specified view.
   * If viewname is null, returns ids of FSs add to the default view.
   * @return
   */
  int[] getIndexDelete(String viewname);

}
  • No labels

1 Comment

  1. Rather than having the journal tied so closely to components, I'd like to be able to create arbitrary marks in the journal and then get changes between two marks (where the beginning and end of the journal are special, always-available marks). This could be in a super class of what you propose here.

    Rather than getting the changes in groups as you have proposed, I would want to get a time-ordered sequence of changes. Each change in the sequence would be a particular type (new FS, modified FS, new index, modified index or deleted index). Again, this could be in a super class of what you have proposed.

    It would be nice if the proposal also recognized the need to produce deltas where deltas are also sequences of changes but with ineffectual changes removed. (E.g., a delta could remove any "modified FS" changes that are overwritten by later changes.)

    Efficiency issues of the journal are similar to efficiency issues in a transactional DB. In that latter case, there is the ability to commit transactions and clear out the transaction log. Similarly here, having methods to "commit" to some mark and collapse the relevant change sequence would be useful in letting app developers participate in making the journal efficient.