Page History

...

Code Block
DN => n * RDN RDN => m * ATAVAVA ATAVAVA => AttributeType + AttributeValue AttributeType => non sensitive ascii String or an OID AttributeValue => a String

Note
Check if attributeType can have options.

Each RDN is separated from the other ones by a ',' or a ';' character.
Each ATAV AVA is separated from the other ones by a '+' character.
Attribute and values are separated by the '=' character.

...

The last point is that spaces are not meaningfull around the ',', ';', '+' and '=' character, nor they are at the beginning and end of each RDN, ATAVAVA, type and value. To be able to store a leading or trailing space into a value, a user must embed this value between two '"' characters.

Of course, some other characters must be escaped : '\', ' ', '#', '=', '', ',', '"', '', ';', and we will have to deal with those cases too, but oly if the value is not surrounded by '"'...

Last, not least, as DNs contain values which are potentially stored into the backend as attribute values, wee must go through the PrepString process for each of this strings, if they are H/R of course (a DN can contain binary values, even if it eems strange, as soon as their value are correctly escaped -using either the hexadecimal notation or the starting # -)

Internal structure

The internal DN structure we have chose depends on different considerations :

speed : we must be able to avoid costly operations, by keeping normalized and parsed forms of the DN
memory consumption : as we said, garbage collecting is a costly operation. We should keep the number of objects necessary to hold a DN as small as possible. We must also have small objects because the smaller they are, the more of them we can store in cache.
serialization : DN are stored in the backend, which means we must serialize and deserialize those DNs. The operation must be fast, as disk access are slower than memory access by 2 orders of magnitude.
CPU consumption : we should avoid as much as possible doing twice the same opeation opreation (like parsing, normalization), because the more operation we do, the more latency we generate due to synchronized portions of code. We also have to release the IOExecutor threads which are used to process the incoming requests.

As everyhting is a balance between those elements, we should favor the most frequent cases, which are pure ASCII DNs where RDN contains only one ATAVAVA, and where value are not escaped. If we don't fall into this simple case, then we fallback to the complex parsing.

DN cycle of life

DNs are send by users to identify entries. When a an entry is first stored, its DN is created in the DN table. When a user is searching for a specific entry using its DN, it is compared whith the stored DNs. When a user does a ModifyDN operation, the DN might be modified for one or more entry (either replaced completely, or renamed). This last cas has an important impact on the internal data structure we chose.

What is important is that we have two forms for a DN : UP (User Provided) and Norm (normalized DN). UP DNs are used when we return this info to the user. Norm form is used internally to uniquely identify entries. It is very important to understand that DNs are manipulated in their Norm form into the server.

We have one special case : when moving data from a place to another one with the ModifyRDN request, which can modify the DN (its UP and Norm form), so we must be able to construct a new UP and Norm form of the modified DN.

Note

So we will keep a Normalized form (Norm form and an UP form) and a user provided form (UP). What about storing the bytes ?

As every DNs are received by the server as an array of bytes, and will be returned to the client as an array of bytes, we might wangt want to store this byte array in the internal DN structure.
The main benefit is that we will be able to encode the DN very quickly, avoiding a costly call to the method dn.getBytes( "UTF-8" ). The main drawback is that we will have to store more data in memory, and serialization will cost more.

We have to check if this is a valid optimization or not, by running some benchmarks.

Internally, we will use the normalized form, which is a form where the values have been processed by the PrepString algorithm, special chars have been unescaped, surrounding double quote have been removed and value has been normalized accordingly to its associated attribuyeType normalizer (assuming that we have such an attributeType normalizer available).

Normalization

We have for five kinds of normalization to apply :

attribute Attribute types are lower cased and spaces are removed around ',', ';', '=', and '+'. Options are kept.
then Then attribute type are transformed to their OID counterpart to avoid having to deal with multiple form of the same attributeType (AT can have aliases, like CN, CommonNmae, and 2.5.4.4 which are all the same attributeType)
Attribute values are unescaped and transformed to Strings, if the attributeType is H/R
Attribute values are transformed applying the PrepString algorithm if their AttributeType is H/R
Attribute attribute values are normalized accordingly to their attributeType suyntaxsyntax.
at At the end, into a RDN, ATAV AVA will be ordered following the alphabetic increase order.

Note
PrepString process hould be executed at the right moment. As we may have escaped chars, this could not occur before the unescaping process, but as soon as we have a String, we can do it (so it's before the normalization)

For instance, the following RDN :
"ou=" Some People " + dc = And + Some anImAls,dommainComponent = eXample,dc= cOm"
will be transformed into as the Norm string:
"0.9.2342.19200300.100.1.25=+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com"
and the internal storage of UP and normalized values will be :

Code Block


(here, quotes are just used to expose the leading and trailing spaces)

DN
  UP : 'ou=" Some   People  " + dc =  \+   Some anImAls,dommainComponent = eXample,dc= cOm'
  Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com'
  RDN 1
    UP : 'ou=" Some   People  " + dc =  \+   Some anImAls'
    Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ '
    AVA 1
      UP AT : dc
      UP val : '\+   Some anImAls'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = '+ some animals'
    AVA 2 
      UP AT : ou
      UP val : '" Some   People  "'
      Norm AT : 2.5.4.11
      Norm val = ' some people  ' 
  RDN 2
    UP : 'dommainComponent = eXample'
    Norm : '0.9.2342.19200300.100.1.25=example'
    AVA 1
      UP AT : dommainComponent
      UP val : 'eXample'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = 'example'
  RDN 3
    UP : 'dommainComponent = cOm'
    Norm : '0.9.2342.19200300.100.1.25=com'
    AVA 1
      UP AT : dommainComponent
      UP val : 'cOm'
      Norm AT : 0.9.2342.19200300.100.1.25
      Norm val = 'com'

As we can see, types are replaced by their OID, useless spaces are removed, RDN are reordered and values are normalized (lowercased, and multiple inner spaces are replaced by a simple space, accordingly with OU and DC normalizer).

Another important thing is that we store different kind of UP values depending on the level of storage (DN, RDN or AVA). This is necessary as we manipulate those informations at different levels tooo, depending on which operation we are dealing with : values comparisons, DN modification, DN comparisons...

A special case is when we can't find the AttributeType in the schema. They are three cases where it can happen :

the entry is a referral (ie the referral OC is present into this entry)
the entry has the extensible ObjectClass
we are facing an error

In the two first cases, we won't be able to do a prepSTring nor a normalization as we don't know if the AttributeType is H/R or not, and we don't know either anything about the normalizer. We will simply track the fact that the AttributeType is unknown by not attaching a reference to the associated AttributeType object (for ServerDNs) and by not filling the normalized form for the AVA (keeping it null: an empty tring won't be enough, as we may have empty values after normalization - a very weird and twisted posibility, but ...-).

Internal structure

We have five objects to describe :

DN
RDN
ATAVAVA
Type
Value

The following paragraphs describe those ojects internal structure.

...

A RDN may contains more than one ATAVAVA, but usually contains only one. An optimization could be not to create an array to store more than one ATV, but instead keeping it in a single member. If we have more than one ATAVAVA, then we will create an array instead. The extra cost of creating this array is totally acceptable regarding the frequency of such multiple ATAVs AVAs in a RDN (which is very low). On the osther sideother hand, it forces the access metho method to deal with the number of atavAVA, but this cost is obviously less than access an atav AVA through an ArrayList (to be double checked)

Note
What about a byte[] to store this Rdn? It does not seems we need it, even if we need the byte arrays to create a new DN when dealing with a ModifyDN operation. As it is not a frequent operation, we can accept the extra cost of a conversion from String to a byte array.

Code Block

	  class RDN
		
    
    // The user provided form
		    String upRdn;

		    // The ATAVnormalized ifform
 we have only one
		ATAV atavString normRdn;

		//  A flag set// toThe trueAVA if we have only one ATAV
		boolean isSingle
    AVA ava;

		    // An array of ATAVsAVAs if we have more than one
		List<ATAV> atavs;
    List<AVA> avas;

    // The RDN position in the UP DN
    int start;

    // The RDN length in the UP DN
    int length;

The extra two fields (start and lngth) are usefull when dealing with a ModifyDN operation, in order to keep the UP RDN ordering. If the user wants to replace a RDN by another, we have to subtitute the RDN in the original position, creating a new UP DN.

Note
AVAs
Note
atavs are stored in alphabetic order ?

...

AVA internal structure

An ATAV AVA contains an attributeType and an attribute value. We could keep the String representation of an ATAV AVA internally, but as those members already have a String representation, and as we rarelly manipulate an ATAV AVA (except when creating it while decoding a DN), there is no need to store this duplicated information.

Code Block
class ATAV AttributeTypeAVA Attribute type; AttributeValue Value<?> value;

...

Attribute internal structure

We will store only two informations :

the User provided form
the attributeType OID

Note
we can keep the OID in two forms : as a String, or as an OID object. The OID object is smaller, and allows faster comparisons, as an OID String will be at least 2 times longer than an OID object: "1.2.840.48018.1.2.2" is a 19 chars long string, while its equivalent OID : 0x2A, 0x86, 0x48, 0x82, 0xF7, 0x12, 0x01, 0x02, 0x02, which is a 9 bytes long array

Code Block
class AttributeType String upType; OID normType;

AttributeValue internal structure

This element should be kept as a String (UP form) and as a byte array if the corresponding AT is not H/R, or as a String if the AT is H/R (H/R : Human Readable)

It could be good to keep it as a byte array to avoid a costly conversion when sending back the data to the user (to be checked)

The structure will be :

Code Block


	class AttributeValue
		// a flag which tells if this value is H/R or not
		boolean isHR; 

		// The user povided value
  		String upValue; 

		// The structure which hold the value, either a String ro a byte[]
  		NormValue normValue;

with the interface NormValue and the implementing classes :

Code Block
interface NormValue class NormStringValue implements NormValue String value; // The value if it's a String class NormBytesValue implements NormValue byte[] value; // The value if it's a byte array

...

We might also create an interface to handle big values (above 1024 bytes or char, for instance), called StreamedValue :
interface StreamedValue

the reference to the asociated AttributeType object if we have access to the schema

A the second form already contains the firt one, normalized, we don't need to keep both of them. It's better to define an interface and two implementing clases : UnknownAttribute and SchemAttribute. The interface will define common accessors for thos two classes. We will also define an intermediate Abstract class, carrying the common methods.

Code Block


  interface Attribute
    // Tells if the attributeType is known
    boolean isSchemaAvailable();

    // Tells if the attributeType is binary
    boolean isBinary();


  absstract class AbstractAttribute
    // Impement the common methods


  class UnknownAttribute
    String        upType;


  class SchemaAttribute
    AttributeType attributeType;

AttributeValue internal structure

We will ue the Value<?> class we have already defined.

...

Child pages

Versions Compared

Old Version 1

New Version Current

Key

Internal structure

DN cycle of life

Normalization

Internal structure

AVA internal structure

Attribute internal structure

AttributeValue internal structure

AttributeValue internal structure