Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added some more input.

...

Code Block
DN => n * RDN
RDN => m * AVA
AVA => AttributeType + AttributeValue
AttributeType => non sensitive ascii String or an OID
AttributeValue => a String
Note

Check if attributeType can have options.

Each RDN is separated from the other ones by a ',' or a ';' character.
Each AVA is separated from the other ones by a '+' character.
Attribute and values are separated by the '=' character.

...

Of course, some other characters must be escaped : '\', ' ', '#', '=', '', ',', '"', '', ';', and we will have to deal with those cases too, but oly if the value is not surrounded by '"'...

Last, not least, as DNs contain values which are potentially stored into the backend as attribute values, wee must go through the PrepString process for each of this strings, if they are H/R of course (a DN can contain binary values, even if it eems strange, as soon as their value are correctly escaped -using either the hexadecimal notation or the starting # -)

Internal structure

The internal DN structure we have chose depends on different considerations :

  • speed : we must be able to avoid costly operations, by keeping normalized and parsed forms of the DN
  • memory consumption : as we said, garbage collecting is a costly operation. We should keep the number of objects necessary to hold a DN as small as possible. We must also have small objects because the smaller they are, the more of them we can store in cache.
  • serialization : DN are stored in the backend, which means we must serialize and deserialize those DNs. The operation must be fast, as disk access are slower than memory access by 2 orders of magnitude.
  • CPU consumption : we should avoid as much as possible doing twice the same opeation opreation (like parsing, normalization), because the more operation we do, the more latency we generate due to synchronized portions of code. We also have to release the IOExecutor threads which are used to process the incoming requests.

As everyhting is a balance between those elements, we should favor the most frequent cases, which are pure ASCII DNs where RDN contains only one AVA, and where value are not escaped. If we don't fall into this simple case, then we fallback to the complex parsing.

DN cycle of life

DNs are send by users to identify entries. When a an entry is first stored, its DN is created in the DN table. When a user is searching for a specific entry using its DN, it is compared whith the stored DNs. When a user does a ModifyDN operation, the DN might be modified for one or more entry (either replaced completely, or renamed). This last cas has an important impact on the internal data structure we chose.

What is important is that we have two forms for a DN : UP (User Provided) and Norm (normalized DN). UP DNs are used when we return this info to the user. Norm form is used internally to uniquely identify entries. It is very important to understand that DNs are manipulated in their Norm form into the server.

We have one special case : when moving data from a place to another one with the ModifyRDN request, which can modify the DN (its UP and Norm form), so we must be able to construct a new UP and Norm form of the modified DN.

Note

So we will keep a Normalized form (Norm form and an UP form) and a user provided form (UP). What about storing the bytes ?

As every DNs are received by the server as an array of bytes, and will be returned to the client as an array of bytes, we might wangt want to store this byte array in the internal DN structure.
The main benefit is that we will be able to encode the DN very quickly, avoiding a costly call to the method dn.getBytes( "UTF-8" ). The main drawback is that we will have to store more data in memory, and serialization will cost more.

We have to check if this is a valid optimization or not, by running some benchmarks.

Internally, we will use the normalized form, which is a form where the values have been processed by the PrepString algorithm, special chars have been unescaped, surrounding double quote have been removed and value has been normalized accordingly to its associated attribuyeType normalizer (assuming that we have such an attributeType normalizer available).

Normalization

We have for five kinds of normalization to apply :

  • attribute Attribute types are lower cased and spaces are removed around ',', ';', '=', and '+'. Options are kept.
  • then Then attribute type are transformed to their OID counterpart to avoid having to deal with multiple form of the same attributeType (AT can have aliases, like CN, CommonNmae, and 2.5.4.4 which are all the same attributeType)
  • Attribute values are unescaped and transformed to Strings, if the attributeType is H/R
  • Attribute values are transformed applying the PrepString algorithm if their AttributeType is H/R
  • Attribute attribute values are normalized accordingly to their attributeType suyntaxsyntax.
  • at At the end, into a RDN, AVA will be ordered following the alphabetic increase order.
Note

PrepString process hould be executed at the right moment. As we may have escaped chars, this could not occur before the unescaping process, but as soon as we have a String, we can do it (so it's before the normalization)

For instance, the following RDN :
"ou=" Some People " + dc = And + Some anImAls,dommainComponent = eXample,dc= cOm"
will be transformed into as the Norm string:
"0.9.2342.19200300.100.1.25=+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com"
and the internal storage of UP and normalized values will be :

Code Block

(here, quotes are just used to expose the leading and trailing spaces)

DN
  UP : 'ou=" Some   People  " + dc =  \+   Some anImAls,dommainComponent = eXample,dc= cOm'
  Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com'
  RDN 1
    UP : 'ou=" Some   People  " + dc =  \+   Some anImAls'
    Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ '
    AVA 1
      UP AT : dc
      UP val : '\+   Some anImAls'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = '+ some animals'
    AVA 2 
      UP AT : ou
      UP val : '" Some   People  "'
      Norm AT : 2.5.4.11
      Norm val = ' some people  ' 
  RDN 2
    UP : 'dommainComponent = eXample'
    Norm : '0.9.2342.19200300.100.1.25=example'
    AVA 1
      UP AT : dommainComponent
      UP val : 'eXample'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = 'example'
  RDN 3
    UP : 'dommainComponent = cOm'
    Norm : '0.9.2342.19200300.100.1.25=com'
    AVA 1
      UP AT : dommainComponent
      UP val : 'cOm'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = 'com'

As we can see, types are replaced by their OID, useless spaces are removed, RDN are reordered and values are normalized (lowercased, and multiple inner spaces are replaced by a simple space, accordingly with OU and DC normalizer).

Another important thing is that we store different kind of UP values depending on the level of storage (DN, RDN or AVA). This is necessary as we manipulate those informations at different levels tooo, depending on which operation we are dealing with : values comparisons, DN modification, DN comparisons...

A special case is when we can't find the AttributeType in the schema. They are three cases where it can happen :

  • the entry is a referral (ie the referral OC is present into this entry)
  • the entry has the extensible ObjectClass
  • we are facing an error

In the two first cases, we won't be able to do a prepSTring nor a normalization as we don't know if the AttributeType is H/R or not, and we don't know either anything about the normalizer. We will simply track the fact that the AttributeType is unknown by not attaching a reference to the associated AttributeType object (for ServerDNs) and by not filling the normalized form for the AVA (keeping it null: an empty tring won't be enough, as we may have empty values after normalization - a very weird and twisted posibility, but ...-).

Internal structure

We have five objects to describe :

...

A RDN may contains more than one AVA, but usually contains only one. An optimization could be not to create an array to store more than one ATV, but instead keeping it in a single member. If we have more than one AVA, then we will create an array instead. The extra cost of creating this array is totally acceptable regarding the frequency of such multiple AVAs in a RDN (which is very low). On the osther sideother hand, it forces the access metho method to deal with the number of AVA, but this cost is obviously less than access an AVA through an ArrayList (to be double checked)

Note

What about a byte[] to store this Rdn? It does not seems we need it, even if we need the byte arrays to create a new DN when dealing with a ModifyDN operation. As it is not a frequent operation, we can accept the extra cost of a conversion from String to a byte array.

Code Block
	  class RDN
		
    
    // The user provided form
		    String upRdn;

		    // The AVAnormalized ifform
 we have only one
		AVA avaString normRdn;

		//  A flag set// toThe trueAVA if we have only one
    AVA
		boolean isSingleava;

		    // An array of AVAs if we have more than one
		    List<AVA> avas;

    // The RDN position in the UP DN
    int start;

    // The RDN length in the UP DN
    int length;

The extra two fields (start and lngth) are usefull when dealing with a ModifyDN operation, in order to keep the UP RDN ordering. If the user wants to replace a RDN by another, we have to subtitute the RDN in the original position, creating a new UP DN.

Note

AVAs are stored in alphabetic order ?

AVA internal structure

An AVA contains an attributeType and an attribute value. We could keep the String representation of an AVA internally, but as those members already have a String representation, and as we rarelly manipulate an AVA (except when creating it while decoding a DN), there is no need to store this duplicated information.

Code Block
	class AVA
		AttributeType  Attribute type;
		AttributeValue  Value<?> value;

...

Attribute internal structure

We will store only two informations :

  • the User provided form
  • the attributeType OID
Note

we can keep the OID in two forms : as a String, or as an OID object. The OID object is smaller, and allows faster comparisons, as an OID String will be at least 2 times longer than an OID object:
"1.2.840.48018.1.2.2" is a 19 chars long string, while its equivalent OID :
0x2A, 0x86, 0x48, 0x82, 0xF7, 0x12, 0x01, 0x02, 0x02, which is a 9 bytes long array

Code Block

	class AttributeType
		String 	upType;
		OID	normType;

AttributeValue internal structure

This element should be kept as a String (UP form) and as a byte array if the corresponding AT is not H/R, or as a String if the AT is H/R (H/R : Human Readable)

It could be good to keep it as a byte array to avoid a costly conversion when sending back the data to the user (to be checked)

The structure will be :

Code Block

	class AttributeValue
		// a flag which tells if this value is H/R or not
		boolean isHR; 

		// The user povided value
  		String upValue; 

		// The structure which hold the value, either a String ro a byte[]
  		NormValue normValue; 

with the interface NormValue and the implementing classes :

Code Block
	
	interface NormValue

	class NormStringValue implements NormValue
		String value; // The value if it's a String

	class NormBytesValue implements NormValue
		byte[] value; // The value if it's a byte array

...

We might also create an interface to handle big values (above 1024 bytes or char, for instance), called StreamedValue :
interface StreamedValue

  • reference to the asociated AttributeType object if we have access to the schema

A the second form already contains the firt one, normalized, we don't need to keep both of them. It's better to define an interface and two implementing clases : UnknownAttribute and SchemAttribute. The interface will define common accessors for thos two classes. We will also define an intermediate Abstract class, carrying the common methods.

Code Block

  interface Attribute
    // Tells if the attributeType is known
    boolean isSchemaAvailable();

    // Tells if the attributeType is binary
    boolean isBinary();


  absstract class AbstractAttribute
    // Impement the common methods


  class UnknownAttribute
    String        upType;


  class SchemaAttribute
    AttributeType attributeType;

AttributeValue internal structure

We will ue the Value<?> class we have already defined.

...