Introduction

This document describes design choices made regarding the ChangeLogService.

h2 Glossary

We need to define precisely some terms before starting to describe this service, to avoid any semantic confusion.

  • Transaction : <to be done>
  • Revision : <to be done>
  • Event Log : <to be done>
  • Transaction Log : <to be done>
  • Journal : <to be done>

Protocol Changes

  1. Add the revisions attribute to the apache schema and enable addition of attribute to entries
  2. Add the Proxied Authorization Control (needed for replay)
  3. Add extended operation to take a snapshot
  4. Add extended operation to rollback to a previous revision
  5. Add capability to request specific versions of an attribute using the LDAP rev tag

Representing Changes (deltas)

Every change made to the LDAP server can be logged and tracked using an LDIF. This is especially the case if the server's configuration is also moved into the DIT.

A log of LDIF entries can track each new revision of the server. This can be used to audit changes on the DIT, on subtrees of the DIT, their entries and even on attributes. Each entry can track revisions in the change log using the revisions operational attribute. The revisions represent primary keys into the change log and act as a reference which can be used to query for information about changes from the server.

To enable operations to be reverted a change log must capture complete information to revert the change. The best way to do this is by calculating the reverse LDIF from the change and storing it along with the forward LDIF. We can calculate these reverse LDIF pretty easily using the following reverse operations on forward changes:

Forward Change

Reverse Operation

Add

Delete

Delete

Add

ModifyRdn

ModifyRdn

ModifyDn

ModifyDn

Modify.Add

Modify.Remove

Modify.Remove

Modify.Add

Modify.Replace

Modify.Replace

To revert a sequence of forward (F) LDIFs { F0, F1, F2, F3 } the reverse (R) LDIF operations are applied in the opposite order { R3, R2, R1, R0 }. Some optimizations can be inferred from a sequence to make reverting faster. For example if F0 represents an Add operation and the other changes represent Modify operations on the same entry then only R0 can be applied instead of the R3->R0 sequence of reverse LDIFs. For the time being we just note that these optimizations are possible for implementing reversion capabilities.

So the whole point to this exercise was to clearly define what we need to track for a change. Obviously we need a unique auto incrementing synchronized sequence to pull new revision numbers from. This will be used for the revision of a change. Each revision has a change associated with it. Here's a list of what needs to be tracked for each change:

  • Revision Number: number assigned to the new state of the server once the change is applied
  • Forward LDIF: the LDIF applied to switch from S0->S1 (rev0->rev1)
  • Reverse LDIF the LDIF applied to revert from S1->S0 (rev1->rev0)
  • Change Timestamp: the time the change occured (GMT)
  • Principal: the distinguished name of the authorized user that made the change

The following page present the way reverse LDIF are generated : Generating reverse LDIF

Change Stores

There are different levels to which we can implement this feature. I think we should enable different pluggable implementations for the change log service to expose different levels of functionality. To enable this we have to design a few different interfaces for the service and it's subcomponents. The following levels of functionality should be possible:

  • Basic Change Log: logs changes only (no snapshots)
  • Searchable Change Log: logs changes and allows searching on changes
  • Taggable Change Log: allows tagging for snapshots
  • Searchable and Taggable Change Log: allows tagging and taking snapshots with search capabilities

The same change log logic can be used to swap out different components of a log store to provide varying capabilities and still apply tags and changes to the store interface. Let's take a look at some of the store interfaces which really act as an SPI for this subsystem.

There are 4 kinds of stores represented:

  • ChangeLogStore: the simplest kind of change log store that can be implemented (just for logging)
    • Primary method is
      long log( Principal dn, Entry forward, Entry reverse )
    • Another method exists to get the current revision number (should be published in RootDSE perhaps)
  • TaggableChangeLogStore: this store allows for tagging for snapshots
    • Two tag() methods exist one which generates a tag on the current revision another on a revision in the past
  • SearchableChangeLogStore: a simple store which exposes access to a search engine over change log events
  • SearchableTaggableLogStore: a taggable log store which enables searching over changes and tags

Simple TaggableChangeLogStore Implementation

The first very basic functionality is to implement a taggable change log without all the search capabilities. It has already been added without tracking of reverse LDIFs.

There is an existing page started by Ersin Er where you have a description of the existing ChangeLog interceptor :
Logging Subsystem

This implementation should keep things simple. Two files, forward.ldif, and reverse.ldif should be appended as changes occur. As one might suspect the forward.ldif stores in LDIF format the forward changes and is appended to at the tail of the file. The reverse.ldif file stores the reverse LDIF to revert the forward change and is appended to from the head of the file. This way a set of changes can be applied and reverted using these two files. Additional information such as the revision number, timestamp and the principalDn can be encoded within comments before the LDIF entry using implementation specific keys like the following:

# REVISION: 1234
# PRINCIPAL: uid=admin,ou=system
# TIMESTAMP: 200706202342343Z
dn: cn=jane doe,ou=users,ou=system
objectClass: top
objectClass: person
cn: jane doe
sn: doe

The revisions and relative position from tail or head tie together the forward and reverse LDIF. This can be used to selectively extract the changes to revert the server to some state other than the start state where revision = 0. Obviously this is not the most high performance implementation one can have but we're not going to be performing any complex search operations over this data.

Another thing this ChangeLogStore can do is backup the log files after reaching a certain size. I don't recommend doing this since it does not really serve a purpose. All the records are needed for history so you don't want to delete the files anyway. Keeping backups of smaller files just creates the problem of figuring out application order with separate files. It can be done with prefixes listing the start and stop revision numbers in the file name but this is not worth the hassle. Also note that if the history is cleared you should be able to start from zero again presuming the start you're in to be the start state: this might be useful for staging changes which does not require the entire history.

The best feature one can add to such an implementation is to store the log files in zipped form and insert new entries into them without having to expand the entire file. This however is the only feature worth adding to such a simple implementation.

A separate file can be used to track snapshots. A simple properties file can be used for this where the key is the revision number for the snapshot tag and the value of the key is the description for the snapshot. This probably will not grow very large at all. Another file can be used to persist the current revision or a pointer could be kept on the head or tail to quickly read the REVISION information in the comments of the last entry added to either the forward or reverse LDIF file.

The change log should be a simple interceptor for the time being and can be configured via Spring or programmatically to be added to the interceptor chain. By default it should be disabled. Users can enable server versioning if they would like by uncommenting the interceptor.

Configuration information may be needed for the following possible settings:

  • changeLogDirectory [path url] - where to put the changelog files
  • compressChangeLog [boolean] - whether or not to keep the change log ldif files compressed
  • No labels