...
Code Block |
---|
DN => n * RDN RDN => m * ATAVAVA ATAVAVA => AttributeType + AttributeValue AttributeType => non sensitive ascii String or an OID AttributeValue => a String |
Note |
---|
Check if attributeType can have options. |
Each RDN is separated from the other ones by a ',' or a ';' character.
Each ATAV AVA is separated from the other ones by a '+' character.
Attribute and values are separated by the '=' character.
...
The last point is that spaces are not meaningfull around the ',', ';', '+' and '=' character, nor they are at the beginning and end of each RDN, ATAVAVA, type and value. To be able to store a leading or trailing space into a value, a user must embed this value between two '"' characters.
Of course, some other characters must be escaped : '\', ' ', '#', '=', '', ',', '"', '', ';', and we will have to deal with those cases too, but oly if the value is not surrounded by '"'...
Last, not least, as DNs contain values which are potentially stored into the backend as attribute values, wee must go through the PrepString process for each of this strings, if they are H/R of course (a DN can contain binary values, even if it eems strange, as soon as their value are correctly escaped -using either the hexadecimal notation or the starting # -)
Internal structure
The internal DN structure we have chose depends on different considerations :
- speed : we must be able to avoid costly operations, by keeping normalized and parsed forms of the DN
- memory consumption : as we said, garbage collecting is a costly operation. We should keep the number of objects necessary to hold a DN as small as possible. We must also have small objects because the smaller they are, the more of them we can store in cache.
- serialization : DN are stored in the backend, which means we must serialize and deserialize those DNs. The operation must be fast, as disk access are slower than memory access by 2 orders of magnitude.
- CPU consumption : we should avoid as much as possible doing twice the same opeation opreation (like parsing, normalization), because the more operation we do, the more latency we generate due to synchronized portions of code. We also have to release the IOExecutor threads which are used to process the incoming requests.
As everyhting is a balance between those elements, we should favor the most frequent cases, which are pure ASCII DNs where RDN contains only one ATAVAVA, and where value are not escaped. If we don't fall into this simple case, then we fallback to the complex parsing.
DN cycle of life
DNs are send by users to identify entries. When a an entry is first stored, its DN is created in the DN table. When a user is searching for a specific entry using its DN, it is compared whith the stored DNs. When a user does a ModifyDN operation, the DN might be modified for one or more entry (either replaced completely, or renamed). This last cas has an important impact on the internal data structure we chose.
What is important is that we have two forms for a DN : UP (User Provided) and Norm (normalized DN). UP DNs are used when we return this info to the user. Norm form is used internally to uniquely identify entries. It is very important to understand that DNs are manipulated in their Norm form into the server.
We have one special case : when moving data from a place to another one with the ModifyRDN request, which can modify the DN (its UP and Norm form), so we must be able to construct a new UP and Norm form of the modified DN.
Note |
---|
So we will keep a Normalized form (Norm form and an UP form) and a user provided form (UP). What about storing the bytes ? As every DNs are received by the server as an array of bytes, and will be returned to the client as an array of bytes, we might wangt want to store this byte array in the internal DN structure. We have to check if this is a valid optimization or not, by running some benchmarks. |
Internally, we will use the normalized form, which is a form where the values have been processed by the PrepString algorithm, special chars have been unescaped, surrounding double quote have been removed and value has been normalized accordingly to its associated attribuyeType normalizer (assuming that we have such an attributeType normalizer available).
Normalization
We have for five kinds of normalization to apply :
- attribute Attribute types are lower cased and spaces are removed around ',', ';', '=', and '+'. Options are kept.
- then Then attribute type are transformed to their OID counterpart to avoid having to deal with multiple form of the same attributeType (AT can have aliases, like CN, CommonNmae, and 2.5.4.4 which are all the same attributeType)
- Attribute values are unescaped and transformed to Strings, if the attributeType is H/R
- Attribute values are transformed applying the PrepString algorithm if their AttributeType is H/R
- Attribute attribute values are normalized accordingly to their attributeType suyntaxsyntax.
- at At the end, into a RDN, ATAV AVA will be ordered following the alphabetic increase order.
Note |
---|
PrepString process hould be executed at the right moment. As we may have escaped chars, this could not occur before the unescaping process, but as soon as we have a String, we can do it (so it's before the normalization) |
For instance, the following RDN :
"ou=" Some People " + dc = And + Some anImAls,dommainComponent = eXample,dc= cOm"
will be transformed into as the Norm string:
"0.9.2342.19200300.100.1.25=+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com"
and the internal storage of UP and normalized values will be :
Code Block |
---|
(here, quotes are just used to expose the leading and trailing spaces) DN UP : 'ou=" Some People " + dc = \+ Some anImAls,dommainComponent = eXample,dc= cOm' Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ ,0.9.2342.19200300.100.1.25=example,0.9.2342.19200300.100.1.25=com' RDN 1 UP : 'ou=" Some People " + dc = \+ Some anImAls' Norm : '0.9.2342.19200300.100.1.25=\+ some animals+2.5.4.11=\ some people\ \ ' AVA 1 UP AT : dc UP val : '\+ Some anImAls' Norm AT : 0.9.2342.19200300.100.1.25 Norm val = '+ some animals' AVA 2 UP AT : ou UP val : '" Some People "' Norm AT : 2.5.4.11 Norm val = ' some people ' RDN 2 UP : 'dommainComponent = eXample' Norm : '0.9.2342.19200300.100.1.25=example' AVA 1 UP AT : dommainComponent UP val : 'eXample' Norm AT : 0.9.2342.19200300.100.1.25 Norm val = 'example' RDN 3 UP : 'dommainComponent = cOm' Norm : '0.9.2342.19200300.100.1.25=com' AVA 1 UP AT : dommainComponent UP val : 'cOm' Norm AT : 0.9.2342.19200300.100.1.25 Norm val = 'com' |
As we can see, types are replaced by their OID, useless spaces are removed, RDN are reordered and values are normalized (lowercased, and multiple inner spaces are replaced by a simple space, accordingly with OU and DC normalizer).
Another important thing is that we store different kind of UP values depending on the level of storage (DN, RDN or AVA). This is necessary as we manipulate those informations at different levels tooo, depending on which operation we are dealing with : values comparisons, DN modification, DN comparisons...
A special case is when we can't find the AttributeType in the schema. They are three cases where it can happen :
- the entry is a referral (ie the referral OC is present into this entry)
- the entry has the extensible ObjectClass
- we are facing an error
In the two first cases, we won't be able to do a prepSTring nor a normalization as we don't know if the AttributeType is H/R or not, and we don't know either anything about the normalizer. We will simply track the fact that the AttributeType is unknown by not attaching a reference to the associated AttributeType object (for ServerDNs) and by not filling the normalized form for the AVA (keeping it null: an empty tring won't be enough, as we may have empty values after normalization - a very weird and twisted posibility, but ...-).
Internal structure
We have five objects to describe :
- DN
- RDN
- ATAVAVA
- Type
- Value
The following paragraphs describe those ojects internal structure.
...
A RDN may contains more than one ATAVAVA, but usually contains only one. An optimization could be not to create an array to store more than one ATV, but instead keeping it in a single member. If we have more than one ATAVAVA, then we will create an array instead. The extra cost of creating this array is totally acceptable regarding the frequency of such multiple ATAVs AVAs in a RDN (which is very low). On the osther sideother hand, it forces the access metho method to deal with the number of atavAVA, but this cost is obviously less than access an atav AVA through an ArrayList (to be double checked)
Note |
---|
What about a byte[] to store this Rdn? It does not seems we need it, even if we need the byte arrays to create a new DN when dealing with a ModifyDN operation. As it is not a frequent operation, we can accept the extra cost of a conversion from String to a byte array. |
Code Block |
---|
class RDN // The user provided form String upRdn; // The ATAVnormalized ifform we have only one ATAV atavString normRdn; // A flag set// toThe trueAVA if we have only one ATAV boolean isSingle AVA ava; // An array of ATAVsAVAs if we have more than one List<ATAV> atavs; List<AVA> avas; // The RDN position in the UP DN int start; // The RDN length in the UP DN int length; |
The extra two fields (start and lngth) are usefull when dealing with a ModifyDN operation, in order to keep the UP RDN ordering. If the user wants to replace a RDN by another, we have to subtitute the RDN in the original position, creating a new UP DN.
Note |
---|
AVAs |
Note |
atavs are stored in alphabetic order ? |
...
AVA internal structure
An ATAV AVA contains an attributeType and an attribute value. We could keep the String representation of an ATAV AVA internally, but as those members already have a String representation, and as we rarelly manipulate an ATAV AVA (except when creating it while decoding a DN), there is no need to store this duplicated information.
Code Block |
---|
class ATAV AttributeTypeAVA Attribute type; AttributeValue Value<?> value; |
...
Attribute internal structure
We will store only two informations :
- the User provided form
- the attributeType OID
Note |
---|
we can keep the OID in two forms : as a String, or as an OID object. The OID object is smaller, and allows faster comparisons, as an OID String will be at least 2 times longer than an OID object: |
Code Block |
---|
class AttributeType
String upType;
OID normType;
|
AttributeValue internal structure
This element should be kept as a String (UP form) and as a byte array if the corresponding AT is not H/R, or as a String if the AT is H/R (H/R : Human Readable)
It could be good to keep it as a byte array to avoid a costly conversion when sending back the data to the user (to be checked)
The structure will be :
Code Block |
---|
class AttributeValue
// a flag which tells if this value is H/R or not
boolean isHR;
// The user povided value
String upValue;
// The structure which hold the value, either a String ro a byte[]
NormValue normValue;
|
with the interface NormValue and the implementing classes :
Code Block |
---|
interface NormValue
class NormStringValue implements NormValue
String value; // The value if it's a String
class NormBytesValue implements NormValue
byte[] value; // The value if it's a byte array
|
...
We might also create an interface to handle big values (above 1024 bytes or char, for instance), called StreamedValue :
interface StreamedValue
- the reference to the asociated AttributeType object if we have access to the schema
A the second form already contains the firt one, normalized, we don't need to keep both of them. It's better to define an interface and two implementing clases : UnknownAttribute and SchemAttribute. The interface will define common accessors for thos two classes. We will also define an intermediate Abstract class, carrying the common methods.
Code Block |
---|
interface Attribute
// Tells if the attributeType is known
boolean isSchemaAvailable();
// Tells if the attributeType is binary
boolean isBinary();
absstract class AbstractAttribute
// Impement the common methods
class UnknownAttribute
String upType;
class SchemaAttribute
AttributeType attributeType;
|
AttributeValue internal structure
We will ue the Value<?> class we have already defined.
...