Work in progress

This site is in the process of being reviewed and updated.

Purpose

This document is a functional specification for a new schema subsystem design. The new design will enable dynamic yet persistent updates to schema elements within the server. Furthermore, entire collections of schema elements referred to as "a schema" will be [un]loadable on the fly. This new mechanism will also expose a persistent partition attached to the namespace from the ou=schema naming context. It will contain a well structured entry based view of schema objects managed by the server with better search and administration capabilities. This is all in addition to the schemaSubentry demanded by the LDAP protocol with attribute based descriptions of all schema elements.

Such a new schema subsystem will make it easier to use the server and manage the schema maintained within it. Furthermore it will greatly facilitate replication since schema elements will simply be entries within the server.

Document TODO

Provide in-server execution flow diagrams for schema loading, schema discovery, schema information reading, etc.
Consider warnings, information attached by Ersin and merge them into the document.

Present Day Schema Subsystem

The schema subsystem today is extremely primitive and only allows for a read-only schema. The schema subsystem uses a system of schema object producers to generate schema objects from Java class files. These class files are generated from OpenLDAP syntax based schema files. For change to be made and take effect recompilation and a restart is required.

The actual grammar implemented to parse Schema files is based on OpenLdap syntax, but the one we should implement must be based on RFC-4512. It has been sum up in Schema Loading)

Why such a primitive inflexible solution? This design was originally intended as a simple bootstrapping mechanism to get a small set of schema elements into memory so they can be used to intialize a partition. We hoped that this partition would later store schema information while allowing for persistent updates to schema objects. However we never found enough time to implement this subsystem in totality and we fell back to using this bootstrap mechanism to store all schema elements for the server.

Two Views of Schema Information

In the new design, two separate views will be provided for describing and interacting with schema information stored within the server. One view is virtual and uses a single entry, the subschemaSubentry (SSSE), as required by the protocol to describe schema elements within attribute values. The other view is non-virtual and not specified by the LDAP protocol.

RFC4512, 4.2. Subschema Subentries

Subschema (sub)entries are used for administering information about
   the directory schema.  A single subschema (sub)entry contains all
   schema definitions (see Section 4.1) used by entries in a particular
   part of the directory tree.

   Servers that follow X.500(93) models SHOULD implement subschema using
   the X.500 subschema mechanisms (as detailed in Section 12 of
   [X.501]), so these are not ordinary object entries but subentries
   (see Section 3.2).  LDAP clients SHOULD NOT assume that servers
   implement any of the other aspects of X.500 subschema.

   Servers MAY allow subschema modification.  Procedures for subschema
   modification are discussed in Section 14.5 of [X.501].

   A server that masters entries and permits clients to modify these
   entries SHALL implement and provide access to these subschema
   (sub)entries including providing a 'subschemaSubentry' attribute in
   each modifiable entry.  This is so clients may discover the
   attributes and object classes that are permitted to be present.  It
   is strongly RECOMMENDED that all other servers implement this as
   well.

   The value of the 'subschemaSubentry' attribute is the name of the
   subschema (sub)entry holding the subschema controlling the entry.

      ( 2.5.18.10 NAME 'subschemaSubentry'
        EQUALITY distinguishedNameMatch
        SYNTAX 1.3.6.1.4.1.1466.115.121.1.12
        SINGLE-VALUE NO-USER-MODIFICATION
        USAGE directoryOperation )

   The 'distinguishedNameMatch' matching rule and the DistinguishedName
   (1.3.6.1.4.1.1466.115.121.1.12) syntax are defined in [RFC4517].

   Subschema is held in (sub)entries belonging to the subschema
   auxiliary object class.

      ( 2.5.20.1 NAME 'subschema' AUXILIARY
        MAY ( dITStructureRules $ nameForms $ ditContentRules $
          objectClasses $ attributeTypes $ matchingRules $
          matchingRuleUse ) )

   The 'ldapSyntaxes' operational attribute may also be present in
   subschema entries.

   Servers MAY provide additional attributes (described in other
   documents) in subschema (sub)entries.

   Servers SHOULD provide the attributes 'createTimestamp' and
   'modifyTimestamp' in subschema (sub)entries, in order to allow
   clients to maintain their caches of schema information.

   The following subsections provide attribute type definitions for each
   of schema definition attribute types.

Schema Entries in the ou=schema Partition

A special partition will be designed using an LDIFPartition implementation.

This partition will contain entries describing individual schema elements and groups of these elements which we call schemas. The structure will be rather simple. Here's quick look at what it might look like to the right.

New objectClasses for Schema Entities

Special objectClasses will need to be defined for schema specific entities to be modeled as entries instead of as attribute values in the schemaSubentry. Some objectClasses which will need to be defined are:

attributeType
objectClass
ditStructure
ditContent
nameForm
syntax
matchingRule
matchingRule
syntaxChecker
normalizer
comparator

These objectClasses will model all of the LDAP schema elements and more. There are 3 additional elements that are ApacheDS specific and are listed last above: syntaxCheckers, normalizers, and comparators. ApacheDS uses these low level constructs to build sytaxes and matchingRules that are used throughout the system.

The SyntaxChecker describes the syntax that an attributeValue must respect.

The Comparators are also implementation used by MatchingRules to compare values.

Normalizers are a little bit different : they do a transformation of an attribute value accordingly to the rules which are given in various RFC for each AttributeType (for instance, names should not be case sensitive, multiple consecutive white spaces should be replaced by a single white space, etc ...). There is no formal description of normalizer with an associated OID.

Normalizers are used by indices and other processes to generate canonical representations of attributeType values so they can be compared. Normalizers are used to normalize values in entries as well as in filter expressions.

Comparators compare values and these constructs implement the java.util.Comparable interface. They are used to sort values to insert them into indices and to evaluate greater and less than expressions where attributeType values must be compared.

SyntaxCheckers are used to constrain values so the correct values are used by entries for attributeTypes of that syntax.

The ObjectClass should also contains the description and the Obsolete flag if set to true.

All those ObjectClasses are defined in the MetaSchema page

New attributeTypes for Schema Attributes

Several new attributeTypes will need to be defined to make all this work. We can easily contrive a list to do this by transposing the current attributes used in various schema descriptions. We've already done this in the image above. For example we would need the following attributeType descriptions to properly describe an objectClassDescription:

oid
desc
name (MV stands for Multi Valued)
sup (MV)
must (MV)
may (MV)
obsolete

It is obvious from the way these attributes are used that for example the must attributeType, the MAY list, the SUP list and the NAME are all multivalued. The others are single valued. This derivation needs to be continued for all the various attributes used to describe all the schema entities in LDAP.

So far, and accordingly to RFC 4512, SUP and NAME are multi valued

All those AttributeTypes are defined in the MetaSchema page

Schema Subsystem Startup

The schema subsystem startup is pretty simple. If we already have an existing schemaPartition, we will read all the schema entities stored in this partition, and initialize the Registries accordingly. At this point, the schema must be valid, otherwise the server won't start. If we don't have yet a schemaPartition on disk, then we extract from a Jar all the base schemaEntities, boot a LdifPartition instance, and load all the schema entities into the Registries.

The schema subsystem is now ready.

Using the X.500 Administrative Model

X.500 provides a powerful model for administering schema information within a DIT directly so different authoriative areas can exist with different schemas in effect. Doing so is simple. At the apex of an SAA (Schema Authoritative Area) a subentry is inserted. This subentry contains a simple subtree specification which is simply {} which means the whole area underneath the apex (which is called the administration point or AP).

What is called "Schema Authoritative Area" above is called as "Subschema Administrative Area" in the standard (X.501-2005, 11.5.2).
What is called "administration point" above is called as "Administrative Point" in the standard (X.501-2005, 11.5.4).

This subentry will have a cn attribute for it's name which we will default to schema always. It will be used as the RDN of the subentry as well. Besides the cn and subtreeSpecification attribute this entry usually contains what we are familiar with in the LDAP world. Basically other attributes such as:

attributeTypes
syntaxes
objectClasses
matchingRules
ditContentRules
ditStructureRules
nameForms
matchingRuleUses

These are the attributes that go into the subschemaSubentry. The problem with LDAP is that it never realized that there can be more than one schema subentry in the portion of the DIT served by a DSA. Every entry within a DSA contains a subschemaSubentry attribute pointing to the schema subentry containing schema information governing that entry. This includes the Root DSE.

The big problem here is the subschemaSubentry attribute is single valued and that makes sense. Only one set of schema rules can govern the structure of an entry at one time. The problem however is the fact that most LDAP browsers read the subschemaSubentry in the RootDSE to find the schema in effect for the whole DIT mastered by the DSA. This presumes there is one schema in effect for the entire DIT and there are no SAAs. Browsers will just presume this.

RFC4512, 4.4. Subschema Discovery

To discover the DN of the subschema (sub)entry holding the subschema
   controlling a particular entry, a client reads that entry's
   'subschemaSubentry' operational attribute.  To read schema attributes
   from the subschema (sub)entry, clients MUST issue a Search operation
   [RFC4511] where baseObject is the DN of the subschema (sub)entry,
   scope is baseObject, filter is "(objectClass=subschema)" [RFC4515],
   and the attributes field lists the names of the desired schema
   attributes (as they are operational).  Note: the
   "(objectClass=subschema)" filter allows LDAP servers that gateway to
   X.500 to detect that subentry information is being requested.

   Clients SHOULD NOT assume that a published subschema is complete,
   that the server supports all of the schema elements it publishes, or
   that the server does not support an unpublished element.

In X.500 multiple SAA's can exist because of a powerful administrative model. We can find a way to merge there worlds together. Essentially the subentry referenced by the RootDSE will point to the global schema knowledge that has been enabled within the server. That includes every schema under ou=schema which has a schemaEnabled (to be defined) attribute set to TRUE. All schema objects under these enabled schemas are loaded into the global registries. This subentry referred to by the RootDSE will hence expose all the schema elements within the global registries as one entry.

Note that this subentry will be completely virtual. Browsers will hence see all schema elements enabled in the server yet as we'll see different schema's will be enforced in different areas of the DIT served by the DSA.

Hence this is the other view of the schema information which we were referring to. This view is the one that LDAP clients are used to. It is also the one that has been the most cumbersome. This view will be constructed by the schema interceptor whenever there are requests to read this global schema subentry.

If no SAAs are defined then the global schema takes effect throughout the entire DIT served by the DSA. If however an SAA is defined then a new administrative point is specified with a schema subentry. For an example let's use dc=example,dc=com as the AP of the SAA. This AP will have a schema subentry which contains a cn used as it's RDN. It's value will always be "schema". This subentry will contain a subtreeSpecification attribute which will always be set to the value "{}". It then can contain the following attributes:

schema
destinationSchema (not readable)
defaultDestinationSchema

The first attribute specifies the schemas that are in effect for this SAA. By referencing the commonName of the schemas defined under ou=schema the schema interceptor will inject all the attributes needed from registries into that schema subentry. Let me use an example to be more clear. Say you have a schema attribute value set to 'samba' in the subentry cn=schema,dc=example,dc=com. When the schema subentry is read by a client that client sees all the attributeTypes, objectClasses, syntaxes etc that would be defined in the samba schema under cn=samba,ou=system. The interceptor injects these additional synthetic attributes into the subentry when it is returned from the server. Furthermore when schema checks are enforced on entries in that SAA, the schema values referenced in that subentry are used to determine the effective schema to use.

So we see we can use the X.500 administrative model and define different SAAs to handle schema differently in different regions of the DIT while maintaining a global schema.

Handling Schema Add Modifications

destinationSchema and defaultDestinationSchema attributes factor in when new schema objects are added using modify operations via SAA subentries. On such operations we have to add new schema elements somewhere under ou=schema but under which schema becomes the question. Administrators can specify which schema to add the new schema entity to using this destinationSchema attribute which is SINGLE-VALUED. destinationSchema attribute cannot be read and can be used only for schema updates. If its value is not set during a schema update the new schema elements will be added to the schema specified with the attribute defaultDestinationSchema. defaultDestinationSchema attribute is again SINGLE-VALUED.

So if destinationSchema is set to the value 'samba' (or it has not been set but defaultDestinationSchema set to 'samba') the new entry is created under the samba area (cn=samba,ou=schema) in the respective position for the type of schema element created. The interesting thing is adding this new entity in this SAA would automatically add the entity to the global schema, and hence the global schema subentry referenced by the RootDSE. Another side effect of this is that the entity would also appear in the subentry for any SAA that referenced the samba schema using the schema attribute in it's schema subentry. If a completely different schema private to the SAA is desired a novel name can be given and ApacheDS should create the new schema entry under ou=schema to contain those new elements.

Now schema changes can also be performed on the schema subentry referenced by the RootDSE. Let's call this the global schema subentry. When add mods are performed here we don't have a destinationSchema available unlike subentries in SAAs. In this case ApacheDS can use a schema called 'other' which includes all objects that have not been classified yet.

An alternative way to using destinationSchema attribute: Standard Schema Extensions

Netscape family LDAP server support a schema element description extension called 'X-ORIGIN'. For example the following objectClassDescription as read from objectClasses attribute says that 'person' object class' origin is RFC 2252:

objectclasses: ( 2.5.6.6 NAME 'person' DESC 'Standard Person 
Object  Class' SUP top MUST (objectlass $ sn $ cn) MAY 
(description $  seealso $ telephoneNumber $ userPassword) 
X-ORIGIN 'RFC 2252'

We want to use this mechanism in a better way to handle destination/source schema information for schema elements. We propose the following extension:

X-SCHEMA-NAME

This extension is useful when you need source schema information on read operations over cn=schema. However for bulk updates, destinationSchema attribute is more useful. We can still only support destinationSchema for schema modifications and we can provide X-SCHEMA-NAME as a virtual attribute on schema reads. We can even support both mechanisms. So we have the following implementation possiblities:

Provide support for destinationSchema on schema modify operations and do not support X-SCHEMA-NAME extension.
Provide support for X-SCHEMA-NAME extension on both schema read and modify operations and do not support destinationSchema.
Provide support for destinationSchema on schema modify operations and provide support for virtual X-SCHEMA-NAME extension on schema read operations.
Provide support for destinationSchema on schema modify operations and provide support for X-SCHEMA-NAME extention on both schema read and modify operations. (This may cause some conflicts.)

Handling Schema Delete Modifications

When schema elements are deleted we know which schemas they belong to and can appropriately remove them from under the ou=system area. Again this has side effects where the global schema subentry shows the delete, and as well some SAAs may also show the delete if they reference the schema from which the schema entity was deleted. These are natural implications.

The replace modifications don't even deserve a section here since it's just a matter of performing serveral adds and several deletes.

Review of Server Startup

schema partition starts up
schema subsystem initialized to load all entries in schema partition as registry objects
server starts up nexus, and other partitions including the nexus, schema partition is added to the nexus
interceptors are assembled into chain including the new schema interceptor

At this point the solid state is reached. The server is ready to service requests including updates to schema.

Pre-packaging Schema Data

We still have a bit of a chicken and egg problem remaining. We presume the schema partition under ou=system is pre-populated with all the schemas we desire to toggle as enabled etc. This however presumes we ship with this pre-populated data. This is not to far fetched an idea to use.

Remember the schemaEnabled boolean flag. If a schema object under ou=schema has this flag enabled then the schema is visible in the global registry and referencable within the schema subentry of any SAA. If this flag is toggled off then the schema elements associated with that schema immediately vanish from the global registry as well as any schema subentry of SAAs that reference that schema.

The main problem we must watch out for is to handle both embedded and standalone configuration of the server. To cope with both situations the best option is to create a special apacheds-schema maven module. This module would use a special plugin to fire up the schema partition and load entries into it from OpenLDAP schema files. After loading the db files with schemas it would assemble them into a schema jar along with some classes that could be used to unpack them into some directory structure. This jar is also a good place to put some hard coded schema elements needed to start up the schema partition. I guess the schema paritition can also be packaged into this jar. Would be nice to package it all together if possible.

The schema partition on start up would then check if the right files were created on disk. If so then those files would be used otherwise the new partition files would be unpackaged from the jar and placed into position on disk. This would then be used to fire up the schema partition and begin the initialization process.

Now this solves the problem of both embedded and standalone verions of ApacheDS.

Loading OpenLDAP Schemas

What happens you may ask to the process of loading an OpenLDAP schema file? With this great dynamic system which preserves changes to schema over restarts we now loose the nice ability to load a schema using OpenLDAP schemas.

Really we wanted to avoid having a problem where we had two copies of this data by building in the ability of the server to load schema from both the schema partition and OpenLDAP schema files. The best approach is to have just one authoriative copy even if you have multiple views on that same data.

This does not however mean that we have to abandon support for these OpenLDAP formatted schema files. It's a good thing to use those files interchangably right? The best thing we can do is add a tool to the ApacheDS tools module to load an OpenLDAP schema file into ApacheDS for you.

The same commandline tool that loads the schema directly into the server should also be able to generate 2 different kinds of LDIF files to be manually applied to the directory if that's desired. An LDIF file can be generated to apply the LDIF to the ou=schema area as add operations or to apply it to the global schema subentry as a modify operation with attribute additions. The choice should be yours.

Additional Features

Extension for LDAP Syntaxes

We will use X-IS-BINARY extension for LDAP Syntax descriptions. This will help clients to determine whether an attribute is binary or not.

Points I Forgot to Make

The schema interceptor makes sure the global registries are always in sync with new additions, deletions or modifications that occur to schema entries in the schema partition. This is one of it's responsibilities. It may also keep other SAA specific registries in sync as well if we decide we need to maintain separate registries for SAAs.

Update On Progress

Currently (as of Feb 26th 2007) in the 1.5 branch we've implemented the dynamic schema subsystem as outlined in this document except for the separate SAAs. There is one global schema in effect for now and until this additional feature is requested or someone has an interest in implementing it we're going to keep it that way.

Handling Modification Attributes on Schema Subentry

From and email posted to the mailing list:

Presently the schemaSubentry located at cn=schema is completely virtual (generated on the fly from the schema registries in the server) and it contains attributes which store the schema entity descriptions for the server. The problem we have is to accurately publish the following attributes to reflects schema changes:

creatorsName
createTimestamp
modifiersName
modifyTimestamp

The first two are really easy. The creatorsName will always be the administrator's DN: uid=admin,ou=system. The creatorsTimestamp should be the creatorsName on the ou=schema entry. The rational is this virtual entry is valid at the point the schema subsystem was created. This timestamp will reflect the time when the server was last built as it should since this is when the default schema is created. It's natural to use the admin user for the creatorsName attribute.

The modifiersName and modifyTimestamp are not that easy. Any time there is a change under ou=schema these fields need to be modified and persisted. So storing them in the virtual entry is not an option since these values must persist across server restarts. I'm thinking best way to store this information would be to use a special entry under the ou=schema namingContext to store the following attributes:

schemaModifiersName
schemaModifyTimestamp
subschemaSubentryName

It's tempting to store more information here in this entry like the schema entity or the schema that is modified however a schema operation may modify more than one schema entity perhaps in multiple schemas. And a modify operation may perform different kinds of operations on the each of the modified schema entities and this is far too much to track in a single entry. So it's not worth while tracking this information here but in a change log implemented for this purpose at a later date.

So let's keep it simple and do just what we have to do with this special entry. The entry can have it's own objectClass and a simple cn for it's RDN attribute. Here's what I propose for the schema of this entry:

attributetype ( TBD NAME 'schemaModifyTimestamp'
    DESC 'time which schema was modified'
    SUP modifyTimestamp )

attributetype ( TBD NAME 'schemaModifiersName'
    DESC 'the DN of the modifier of the schema'
    SUP modifiersName )

attributetype ( TBD NAME 'subschemaSubentryName'
    DESC 'the DN of the schema subentry this modification information corresponds to'
    EQUALITY distinguishedNameMatch
    SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )

objectclass ( TBD NAME 'schemaModificationAttributes'
        DESC 'a special entry tracking schema modification attributes'
        SUP top STRUCTURAL
        MUST ( cn $ subschemaSubentryName $ schemaModifyTimestamp $ schemaModifiersName ) )

The reason why I use attributes besides modifiersName and modifyTimestamp is to prevent collisions between these injected attributes for the entry itself. Secondly looking for these attributes will also return the other schema schema related attributes since they extend modifiersName and modifyTimestamp respectively.

Any change to the schema entity entries under the ou=schema namingContext will update these schema specific operational attributes as well. When the schema subentry is read these values will be read and populated into the virtual schema subentry dynamically by the schema service. This will lead to the desired effect of correctly informing clients of changes to the global schema.

Note the schemaSubentryName corresponds to the DN of the subentry that these modification attributes correspond to. For our present purposes this will be cn=schema for now until we introduce multiple SAAs. More is discussed about this attribute in the drawback section below. Basically this attribute is here for extension purposes when more than one SAA exists.

Child pages

Schema Subsystem Redesign