Work in progress

This page is in the process of being reviewed and updated.

Introduction

The idea is simple yet extremely powerful. Enable an LDAP server to version itself by tracking changes in a change log. Allow tagging of specific revisions to provide a mechanism for taking snapshots of the server state to potentially be reverted. Thanks to the design of ApacheDS this is something that is not incredibly difficult to implement.

Use Cases and Potential Value

There are several ways in which this feature can be leveraged by our users and us while developing ApacheDS:

  • Interesting impact on replication implementation ideas
  • Could be used for a transaction journal
  • Record a series of changes and play them back: great for testing complex operation sequences
  • Revert test servers after using them without a costly reinstall and restart
  • To audit changes to answer simple questions:
    • who made the change
    • what was changed (entries, attributes, values)
    • when was the change made
    • how was the change made (add, delete, modify, moddn, modrdn)
  • To inquire about the change history at several levels
    • entire server
    • by a user
    • on a region of the tree
    • by dates
    • by change types
    • per entry
    • per attribute
  • Quickly assess change patters in the DIT
  • Tagging revisions for specific states
  • Rolling back/forward server state
  • Requesting attributes and entries by revision
  • Searching the server for entries by revision

These usage scenarios can occur when managing several things however nothing best characterizes the need than managing environment configurations within an infrastructure. There are situations where the configuration of several LDAP enabled applications may need to be rolled back to an earlier state or inquiries need to be made about who changed what, when for the sake of providing thorough reports to auditors.

Can't transactions and replication help?

Some people may ask: "Why do I need this versioning and snapshoting feature if LDAP transactions can be used?" Well transactions are great but they don't stop mistakes, they just make sure they occur transactionally. Users can still delete or change things in the server incorrectly after committing a transaction. Upon realizing the error the transaction cannot be rolled back if it has already been committed. So no transactions don't save you in this case. Replication also does not save you. If you make a mistake when making a change that change will replicate throughout the cluster so you still need versioning a way to rollback to older revisions.

History

A couple years ago I came across the idea of versioning changes inside ApacheDS. At first the idea seemed amazing yet insurmountable to implement with pre-1.0 versions of the server. The best we could do at the time was to test and demo the idea with a simple change log interceptor example in a presentation: Embedding ApacheDS. Ersin Er was interested in getting involved with writing interceptors at the time so he and I discussed the feature. He took a crack at implementing the log as a text based LDIF log file and did a great job. We can still use much of this code today for a simple logger.

Just recently the idea came back up again while Emmanuel Lecharny, Chris Custine and I were discussing how to speed up our integration tests which were taking a lot of time since for each test we were effectively installing the server, starting, stopping and removing it. Instead Emmanuel recommended cleaning the server without restarts to bring it back to it's original state. Reverting the state using a change log was something that would help fix this problem while being useful as a feature in several situations. See the IRC Log of our conversation that kicked off this effort yet again.

Protocol Considerations

As far as LDAP, the protocol, is concerned we have some advantages and can have good synergy with it.

Operational Attributes: revisions

We can use operational attributes to track the revision numbers of changes that took place on an entry. This allows us to use the directory itself to request version specific information. Let's for now presume we created a special revisions operational attribute that can only be modified by the directory. The attribute would be multivalued and would be of the INTEGER syntax. Presume revisions are just numbers representing states of the entire directory tree like in Subversion.

With the proper matchingRules we can ask the server about which entries were changed, present, or deleted using filters on this attribute as well as others.

attributetype ( 1.3.6.1.4.1.18060.0.4.1.2.TBD NAME 'revisions'
    DESC 'Revisions which have altered an entry'
    EQUALITY integerMatch
    ORDERING integerOrderingMatch
    SYNTAX 1.3.6.1.4.1.1466.115.121.1.27
    NO-USER-MODIFICATION
    USAGE dSAOperation X-SCHEMA 'apache' )

Using LDAP Revision Tags

We must qualify what we mean by a tag since in LDAP there are tags and the concept of a tag also exists in versioning WRT snapshots. These two concepts are not related so please when discussing tags either on our mailing list or in documentation qualify them with either LDAP Revision Tag, or Changlog Snapshot Tag.

LDAP has an interesting attribute tagging feature. It allows one to ask for a value of an attribute in a specific context. For example language tags are used to request different values of an attribute. For example I can ask the server to only return to me English versions of an attribute using the following identifier: commonName;lang-en. Emmanuel can ask for the French version: commonName;lang-fr. Our Stefan's can ask for the German version: commonName;lang-de.

The LDAP tagging mechanism is generalized but this language tag is a specific usage example. We could add an LDAP tag for revisions to be used with attribute identifiers like so:

commonName;rev-23
commonName;rev-9374
commonName;rev-231

This way using the standard protocol users can request values of an attribute associated with a specific revision and the server should comply. Calculating the value would be implementation specific however it can still be made to be extremely performant.

LDAP Request/Session Controls

We could also ask for entire entries to be returned on search or for attribute comparisons based on some revision, date, or range of revisions. Controls can allow us to effect the behavior of the LDAP operation to incorporate versioning semantics. It would be interesting to explore the possibilities with each operation.

LDAP Extended Operations

Taking snapshots are easy and can be done via extended operations as well but what about reverting the server back to it's original state. Tools might be able to do this if they can access the changelog (where would it be?) especially with reverse LDIFs to revert changes. However issues arise like managing operational attributes such as modifiersName etc. When applying a change log the tool cannot easily and securely apply it with the credentials of each user who made the modification to guarantee that the servers operatioinal attribute house keeping does it's thing. You could get around this if the server supported the LDAP Proxied Authorization Control. Presuming you had the tools, access to the logs, and the proxied authorization control available the tool still could fail during the course of applying the changes and the server could go down to leave the server's state inconsistent with respect to your application data. So if transactions actually make it into the specifications we could solve this problem too with tooling.

The best bet is to create an extended operation to rollback the server state to a specific revision. If change log information is not available in the server (say the server sends it to some remote log) then the anti-changes can be optionally supplied within the operation's payload. Regardless of how the changes to revert are made available the server can wait until all current operations complete, then reject requests with a busy response code and start applying the changes in a local transaction with the ability to rollback if the revert fails. Furthermore no connection to a client is required to complete the operation properly without partial rollbacks in case of network failures.

Exposing the ChangeLog via LDAP

Change events and the log can be exposed via LDAP to enable sophisticated searches. Furthermore doing this would enable tooling support to better manage server snapshotting and change reversion.

This can easily be implemented in ApacheDS as a custom partition exposed by the ChangeLog service to view the log information. Of course implementation and backing stores will dictate the quality of the search experience but this is up to implementations to remedy.

You might want to allow updates to this partition to only create snapshot tags. This way you don't need an extended operation to take a snapshot.

Design

Since this takes up a lot of space we're do this on a child page here: ChangeLog Design

  • No labels