You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Work in progress

This site is in the process of being reviewed and updated.

Introduction

We will discuss various ways to store lists of simple Java types in LDAP.  LDAP is particularly poor at managing lists.  Lists of primitive types require the maintenance of order.  Simply using a multivalued attributeType does not solve this problem since there is no way to determine the order of the values.  Workarounds can be used each with trade offs.  One workaround may be better than another for certain circumstances.  In any case though, an understanding of how the directory will be used (searched) will factor into the selection of the best workaround.  Any persistence engine mapping Java bean list properties to LDAP should allow for several mechanisms chosen at the user's discretion.

Workaround (1): Model List Elements as Entries with Bean Persistence Schema

A bean persistence schema will contain among others the following objectClasses:

  • bean
    • beanClass
    • beanId
  • beanListProperty
    • beanListPropertyName
  • simpleBeanListElement
    • beanListElementIndex
  • simpleBeanList<Type>Element?
    • simpleBeanList<Type>Value

? This is actually many objectClasses, each contains the name of the simple java type.  These extend the simpleBeanListElement objectClass.

A bean foo with bar list property will contain an immediately subordinate beanListProperty entry.  This entry will have the beanListPropertyName attribute set to 'bar'.  It will contain child entries of the simpleBeanListElement type subclasses. The list can be homogeneous where it contains only one kind of element type or it can contain a mixture of types.

Disadvantages

  • It's not a simple matter to search for beans with a specific value in it's bar list.
  • Lot's of network traffic to assemble the entire list and all it's elements.
  • Client must be aware of the schema and hierarchy used.

Advantages

  • List can be very large in size.
  • List elements are clearly marked based on type.

Search Impact

As mentioned the biggest disadvantage lies in finding foo objects with particular element values in their bar list.  Given a particular foo bean it's easy to find out if it contains a value like 'dog' in the bar list.  Finding foo objects with 'dog' in there bar list is not so simple.  One would first have to search for all the list element entries with a value of 'dog'.  For each one a check must be performed to see if it's parent node is a beanListProperty with a name of 'bar'.  Then the parent of the beanListProperty must be checked to see if it is a foo entry.  If so then the foo entry must be looked up and returned.

A few tactics can be used to remedy this.  First the beanId and property name can be included in the list elements.  This way a search can list all the beanIds which contain a bar property value of dog.  This still however requires a lookup to get the bean entry.  Furthermore duplicates must be ignored.

A stored procedures can be used to do all this work by returning an enumeration to replace specific search requests.  Nothing like this exists within the server however I think this is a good use case for a stored procedure backed view.  Interesting ideas come about when considering this. 

Another tactic is to use a Trigger and SP pair to assemble a virtual multi-valued list attribute.  This is discussed in the section below. 

Workaround (2): Use Trigger and SP Pair to Assemble Virtual Multi Valued List Attribute

A prescriptive trigger and store procedure pair are added to the server.  The trigger fires on search operations that return beanClass objects.  The trigger invokes a stored procedure which creates the multi-valued bar property of <Type>Value within the bean entry filling it up with the values of the simple<Type>Element's <Type>Value.  The AttributesImpl class can be made to preserve the order of attributes in multi-valued attributeTypes.  When returned to the client the order of the bar attribute's property is preserved since the server populates the response PDU while preserving multi-valued attribute value order.

This has one catch in that we need to mark the bar attribute with the X-VIRTUAL schema extension within the schema so a foo entry can prevent schema checks from incorrectly rejecting adds of entries if the bar attributeType is mandatory (due to a cardinality of 1 or more).

This mechanism is ideal because it does not cost additional network traffic and client side sorting of values to achieve ordering of list elements.  The stored procedure handles the order of the values in the multivalued attribute.  No protocol elements are needed such as controls, or extendedResponses making the mechanism portable across clients if the client respects the order in which multi-valued attributes are returned by the server in the SearchEntryResponse PDU.  If not then this mechanism will present some client specific problems.  For embedded use it will not be an issue. 

If this workaround is utilized then we must make sure most existing Java LDAP client libraries respect the order of values returned by a server within a multi-valued attribute.  We can conduct a series of tests easily to determine this and maintain a compatibility matrix for tracking this.  There are not many Java LDAP libraries out there.  If some violate this then we can write our own (which we have most of) to be used with the runtime of our mapping library.

Disadvantages

The biggest impediment to this problem lies in functionality that has yet to be implemented.  Namely there is no way to evaluate search filters containing virtual or computed attributes. Since this capability does not exist *yet* we would have to implement it.  Secondly we would need to add the X-VIRTUAL schema extension as a cue to the schema subsystem to relax it's schema checking constraints on this attribute.

Accounting for virtual attributes in the filter expressions are difficult to do efficiently.  If we wanted to we could have partitions ask the virtualization subsystem for the virtual attributes to inject into an entry before applying assertions on candidates.  However if unconstrained with a large result set the amount of computation would consume the server.  The only way to prevent this is with search limits and by making sure virtual attribute based assertions are evaluated as a last resort if other assertions cannot determine whether the candidate is selected or not.  Secondly how do we ask a virtual subsystem to tell us the values to inject into an entry when this is a trigger based service that executes after a return.  Hard stuff but we can find a way.  We have to anyway.

Another way to fix this issue is to execute another search for satisfying searches on foo entries with a bar value.  For example (& (objectClass=foo) (bar=dog)) can run a search looking for all listProperties named bar in foo entries with the following filter: (& (objectClass=listProperty)(  Once those are found the

Workaround (3): List Elements in Multi-Valued Attributes With Value Prefixing

A prefix can be used within the values of a multivalued attribute to encode the index of the value within the list.  The partial LDIF of the foo entry below shows how such a multivalued bar attribute can be used:

An LDIF of an entry with prefix ordered multivalued attribute bar
dn: ...
bar: 1-cat
bar: 2-dog
bar: 3-rat

Disadvantages

  1. Search and compare operations must be modified by client applications to account for the index prefix.
  2. Cannot efficiently search for foo objects containing a specific bar value (explained under search impact)
  3. Client must also make sure the prefix is maintained
  4. Syntax of actual elements will not correspond to the syntax of the attributeType

Advantages

  1. ApacheDS' JDBM partition can maintain large numbers of attribute values due to B+Tree indirection for values
  2. Searching for foo entries with bar list sizes greater or less than some amount is efficient (explained under search impact)
  3. List is contained within the entry itself
  4. Efficient transfers are possible since only one search result returns the whole list

Search Impact

This has advantages and disadvantages for searching. You obviously cannot search for foo objects containing a bar value of 'dog'.  Instead of this filter,

(& (objectClass=foo) (bar=dog))

one would have to use this filter instead,

(& (objectClass=foo) (bar=*dog))

which is inefficient since a bar attribute index cannot be leveraged properly. The entire index would be scanned instead of a subset since no prefix exists for the substring assertion.  Each index entry's value must be checked to see if it ends in dog.

We can efficiently ask the directory for all foo entries containing a list of bar values with some length or greater.  If we want all foo entries with bar lists with size greater than five we can apply the following search filter:

(& (objectClass=foo) (bar=6-*))

This filter's substring assertion is more efficient because it leverages an index on bar better due to a '6-' prefix.  This way there is a partial scan on the bar index: a cursor can directly advance to the index entry with key starting with 6-.  After all the index keys starting with '6-' have been evaluated then the scan completes to reduce the search set.

Syntax Issues

Another problem with this workaround is that syntaxes cannot be properly used for the attributeTypes defined due to the prefix.  Instead of having the bar syntax account for the value of the list element, the bar syntax must include a component for the index of the element value.

Workaround (4): Delimiter Separated List of Elements in Single Valued Attribute

A single valued attributeType can be used to encode all the elements in the list with delimiters. 

An LDIF of an entry with prefix ordered multivalued attribute bar
dn: ...
bar: cat,dog,rat

There are serious issues with this approach.  First the delimiter must be carefully chosen and often may need to be escaped based on the syntax of the list elements. 

Again this has the same issue regarding proper syntax usage and the need to use poor substring assertions to locate foo entries with a particular bar value.  Here the position of the list element in the value of the attribute determines it's index.  However the performance degrades rapidly as the list grows: the cost of parsing, validating and inserting into the value increases as does the memory and transfer times.

Lists of primitives can also be modeled as subordinate entries containing an index attribute and a value attribute for the list element value.   If these entries are immediately subordinate to foo they can use an RDN that is based fully or in part on the index attribute.  The RDN can be single attribute or multi-attribute based as a composite RDN.  Using the index attribute alone as the single RDN attribute is a bit dangerous since this may collide with any other List property in foo.  Using the index attribute value with a bar [post|pre]fix for a single RDN attribute prevent collisions with other list properties.  The best solution with subordinate entries for list properties is to create a container for the list property where the RDN value of the container entry is the name of the property then the container houses entries with index and value attributes for elements.  The problem with any of these subordinate entry approaches for List based properties is the amount of traffic. Each list element is an entry requiring a SearchResponseEntry LDAP response to the client. 

With subordinate entries used for lists search operations still have issues however the issues are different.  Asking for a foo object with a dog value in the bar List property is still a hassle.  We would have to find all bar entries will value 'dog' then the client would have to sort through them to make sure they correspond to foo objects. Since other objects besides foo can also have bar properties the client cannot presume the returned bar objects correspond to foo based bar values.  Furthermore it's a pain to figure out which foo object the bar value corresponds to without utilizing the DN and that may not help unless the DN contains a directory unique identifier for the foo object.

  • No labels