Distinguished Name

What is a DN

A DN (Distinguished Name) is used as a primary key in a Ldap server. It is a list of RDN (Relative Distinguished Name).

A RDN is also called a Name-component, and is compound of one or many attributeTypeAndValue

An attributeTypeAndValue is a couple with a name (the attribute) and a value separated by the character '='

A DN is supposed to be human readable, so it should contains only UTF-8 characters. Binary values are encoded using their string representation prefixed with a #. There are some special characters which need to be encoded if used, like ',', '=', ...

Life of a DN in a Ldap Server

Client side

When a new entry is added in a ldap server, a DN is used as its primary key. It is first translated from the client representation to a valid string representation, in respect with RFC 2253/RFC 1779.

Sample

givenName = Jérôme #1,    ou = apache,     ou = org

is translated to

givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org

Transfer from client to server

The DN is then tranlated to ASN.1 and transfered to the server.

Sample

givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org

is translated to

0x04 0x38 0x67 0x69 0x76 0x65 0x6E 0x4E
0x61 0x6D 0x65 0x20 0x3D 0x20 0x4A 0x5C
0x43 0x33 0x5C 0x4A 0x39 0x72 0x5C 0x42
0x33 0x5C 0x42 0x34 0x6D 0x65 0x20 0x20
0x5C 0x23 0x31 0x2C 0x20 0x6F 0x75 0x20
0x3D 0x20 0x61 0x70 0x61 0x63 0x68 0x65
0x2C 0x20 0x6F 0x75 0x20 0x3D 0x20 0x6F
0x72 0x67

Decoding on the server

The server will decode the received ASN.1 and build back the DN, but keeping the second form :

Sample

0x04 0x38 0x67 0x69 0x76 0x65 0x6E 0x4E
0x61 0x6D 0x65 0x20 0x3D 0x20 0x4A 0x5C
0x43 0x33 0x5C 0x4A 0x39 0x72 0x5C 0x42
0x33 0x5C 0x42 0x34 0x6D 0x65 0x20 0x20
0x5C 0x23 0x31 0x2C 0x20 0x6F 0x75 0x20
0x3D 0x20 0x61 0x70 0x61 0x63 0x68 0x65
0x2C 0x20 0x6F 0x75 0x20 0x3D 0x20 0x6F
0x72 0x67

is translated to

givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org

Parsing the DN

We have to check that this DN is valid. The DNParser will do it. This is a syntaxic check, we don't check the values at this point.

The internal representation is then created, and it's a class. This class stores the string representation of the DN, after having applied some formating rules :

  • attributes are lower-cased
  • space before and after attributes and values are deleted
  • component names are changed to their OID counterpart
Sample

givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org

is translated to an object which will store a normalized form of this DN :

givenname=J\C3\A9r\C3\B4me #1,ou=apache,ou=org

It is important to note that the user provided DN is not destroyed. We keep then two representation of the DN :

  • A user presentation DN : givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org
  • A normalized presentation DN : givenname=J\C3\A9r\C3\B4me #1,ou=apache,ou=org

Normalization

Now, we need to normalize the DN to be able to do comparisons. The normalization will take every single RDN and check the schema to apply the rule associated with the name.

Sample

givenname=J\C3\A9r\C3\B4me #1,ou=apache,ou=org

is translated to

2.5.4.42=jérôme #1,2.5.4.11=apache,2.5.4.11=org

In this sample, all the inner spaces are ripped off according to the syntax of givenname, which is a subclass of name :

'givenname' (OID 2.5.4.42) inherits from attribute 'name' (OID 2.5.4.41) which syntax is :

name ATTRIBUTE ::= {
       WITH SYNTAX DirectoryString {MAX}
       EQUALITY MATCH RULE caseIgnoreMatch
       SUBSTRINGS MATCHING RULE caseignoreSubstringsMatch
       ID {id-at-name}

Its EQUALITY MATCH RULE is caseIgnore Match, and RFC 2252 says :

8.1 :
...
When performing the caseIgnoreMatch, caseIgnoreListMatch,
telephoneNumberMatch, caseExactIA5Match and caseIgnoreIA5Match,
multiple adjoining whitespace characters are treated the same as an
individual space, and leading and trailing whitespace is ignored.

We now have two representation of the DN :

  • A user presentation DN : givenName = J\C3\A9r\C3\B4me #1, ou = apache, ou = org
  • A normalized presentation DN : 2.5.4.42=jérôme #1,2.5.4.11=apache,2.5.4.11=org

We will have two String representation of this DN.

Storing the DN

The DN is now stored in the database as a string, in two different tables :

  • the nDN table, which stores the normalized DNs;
  • the upDN table which stores the user presentation DNs

Searching for an entry

When a client search for an entry, it gives a DN. We have to find it in the database. This is not so easy, because we have a special cases : RDN can be multi-valued.

Warning

Multi-valued RDN are not ordered, so a=b+c=d and c=d+a=b are equals, but their strings are different.

we can decide that multi-valued RDN are stored in OID order, while parsing the DN :

c=d+a=b -> a=b+c=d

Otherwise, we just have to compare two strings to find an entry.

DN Parsing

Introduction

This page has been written to summarize the way DNs are parsed and stored in Apache Directory Server (ApacheDS).

We will have two versions of a DN: a server version and a client one.

DN Structure

Internal Representation

A DN is a composition of 0 to N RDN.
An RDN is a composition of 1 to N AttributeTypeAndValue (atav).
An atav is a type and a single value, which can be either a String or a byte[], depending on the attributeType.

The following picture represent this structure :


 

It is of first importance to keep a relation between internal structure and the String representation given by the user. Each sub-element will keep a position and a length relative to the parent UP String form:

uid=admin, ou= system + DC = Org
00000000001111111111222222222233
01234567890123456789012345678901

RDN1 : "uid=admin", pos = 0, length = 9
RDN2 : " ou=system + DC = Org", pos = 10, length = 22
atav1 : "uid=admin", pos = 0, length = 9
atav2 : " ou= system ", pos = 10, length = 12
atav3 : " DC = Org", pos = 23, length = 9

Note that spaces are not skipped in user provided (UP) forms.

Internally, we will keep both representation - normalized and UP - for each element - DN, RDN and ATAV -. Let's see what it means for the sample :

DN.upName = "uid=admin, ou= system + DC = Org"
DN.normName = "uid=admin,ou=system+dc=org"
DN.bytes = 'u', 'i', 'd', ...'o', 'r', 'g'
DN.normalized = true
DN.rdns = List ( Rdn1, Rdn2)

RDN1 :  
Rdn1.upname = "uid=admin"
Rdn1.normName = "uid=admin"
Rdn1.start = 0
Rdn1.length = 9
Rdn1.atavs = null (because we have only one atav)
Rdn1.atavsType = null (because we have only one atav)  
Rdn1.atavType = uid

Rdn1.atav.upName = "uid=admin"
Rdn1.atav.normType = uid
Rdn1.atav.upType = uid
Rdn1.atav.value = "admin"
Rdn1.atav.start = 4
Rdn1.atav.length = 5

RDN2 :
Rdn2.upname = " ou= system + DC = Org"
Rdn2.normName = " ou=system+dc=org"
Rdn2.start = 10
Rdn2.length = 22
Rdn2.atavs = Set<atav1, atav2>
Rdn2.atavTypes = Map

Unknown macro: { < "dc", atav2>, <"ou", atav1> }

Rdn2.atavType = null (because we have more than one atav)
Rdn1.atav = null (because we have more than one atav)

Rdn2.atav1 : 
Rdn2.atav1.normType = "ou"
Rdn2.atav1.upType = " ou"
Rdn2.atav1.value = "system"
Rdn2.atav1.upName = " ou= system "
Rdn2.atav1.start = 10
Rdn2.atav1.length = 12

Rdn2.atav2 : 
Rdn2.atav2.normType = "dc"
Rdn2.atav2.upType = " DC "
Rdn2.atav2.value = "org"
Rdn2.atav2.upName = " DC = Org"
Rdn2.atav2.start = 13
Rdn2.atav2.length = 9

DN String Representation

A DN String is an ASCII String, with a restricted set of characters. Each RDN is separated either by a ',' or a ';'.
A RDN String is a ASCII String, with a restricted set of characters. Each atav is separated by a '+'.
An atav is a composition of an attribute and a value, separated by an '='.
A character which is not in the allowed set is escaped.

Java Structure

DN

The DN will have:

  • an upName:String, which is the user provided form
  • a normalized:String, which is the DN where each attribute type is swapped for its OID representation, and each value is correctly escaped. This form is used for comparizons. If a type is binary, then an escaped form will be used for its value : #xyxyxy
  • a rdns:List, the list of RDNs that form the DN, ordered as in the DN. If an RDN has more than one atav then the lower one in alphabetic order will be put on the first position.

note The List can be a double linked list, instead of an ArrayList. We have to check which is the best.

RDN

The RDN will have :

  • an upName:String used to store the UP representation of the RDN not trimmed
  • a normalized:String (the very same way than for DNs). If a type is binary, then an escaped form will be used for its value: #xyxyxy
  • a start:int position in the DN String
  • a length:int length for its representation in the DN string
  • an atav:Atav, which is either a SingleAtav->Atav or a MultipleAtav->Atav (Atav will be the interface, and both SingleAtav and MultipleAtav are classes)

Atav

An Atav has a hierarchy of classes: An interface (Atav) and two classes: SingleAtav holding a single atav, and a MultipleAtav which holds more than one atav for a RDN
It has:
SingleAtav:

  • an upName:String
  • an AttributeType:String.
  • an AttributeValue:Value which can be either a String or a byte array depending on the attribute type (binary or not)
  • a start:int position in the DN String
  • a length:int length for its representation in the DN string

MultipleAtav :

  • an upName:String
  • a atavs:List, which is a list of SingleAtav
  • a start:int position in the DN String
  • a length:int length for its representation in the DN string

Value Parsing

A value can be complex to parse. We have different cases. The following algorithm describes the way a value should be parsed.

As we are parsing the DN forward, we can't go backward. We should know when we have reached the end of a value, which is when we met one of those conditions:

  • we have no more chars
  • we have met a '"' and this '"' is not escaped and we don't have an escaped " on the beginning of the value which has not been closed
  • we have met a ',' or a ';' and these chars have not been escaped and we don't have an escaped " on the beginning of the value which has not been closed

To deal with those conditions, we should store a state for those special cases:

  • dquoteStart will be set if we have met a '"' at the beginning of a value
  • dquoteEnd will be set if we have met a '"' at the end of a value (and this '"' must not be escaped)
  • escaped is a flag used if a '\' have been seen just before the current char.
  • oidValue is a flag set if the attribute type is an OID
  • binary is a flag set when the attributeType is binary

The first step is to trim spaces on the left.

Then we can parse the remaining string, after having read the first char, which will select special cases:

if the first char is a #
  then
    if the attributeType is OID
      then set the oidValue flag to true
      else
        if the attributeType is not binary
          then throw an error
  else
    if the first char is a "
      then set the dquoteStart flag to true

We will read character by charecter until the end is reached. To ease the parsing, we will distinguish the four previous cases:

  • oidValues
  • binary values
  • quoted values
  • and normal values

OidValues

We will parse in two passes:

  • the first one to count the number of hex values
  • the second one to generate the byte[] with the evaluated hex pairs.
    The value must contain only hexPairs, and nothing else but some spaces at the end.
    length = 0;
    start = current position
    while end not reached do
      if current char is in [0-9a-fA-F]
        then increment length
        else break
    done

end = current position

if last char is a " "
  then
    while last char is a space do
      go to next char
    done

if we have a next char and char is not a "" +and char is not a "," and char is not a ";" and length is not even
  then we have an error: throw it

// second pass: evaluate the hexPairs
byte[] value = new byte[length / 2]
pos = 0

for i = start to end do
  byte[pos]= hex value of char[i] * 4 + hex value of char[i + 1]
  i = i + 2
  pos pos + 1
done

Store the byte[] in the current atav value

Binary values

This is exactly the same process than for OID values, so we will reuse it

Quoted value

A quoted value is surrounded by '"', as stated by rfc2253  :Implementations MUST allow a value to be surrounded by quote ('"' ASCII 34) characters, which are not part of the value. Inside the quoted value, the following characters can occur without any escaping:
",", "=", "+", "<", ">", "#" and ";"
 Only two chars need to be escaped : '\' and '"'.

here is the pseudo-code which shows how quoted string are parsed :

escaped = false
result = new buffer
doubleQuoteClosed = false

while  we have a char do
  if current char is '"'
    then
      if escaped is true
        then
          add current char to result
          continue to next char
        else
          doubleQuoteClosed = true
          break
    else
      if current char is '\'
        then
          if escaped is true
            then
              escaped = false
              add current char to result
              continue to next char
            else
              escaped = true
              continue to next char
      else
        add current char to result
        continue to next char
done

if doubleQuoteClosed is false
  then throw an error

store the result into atav value.

Other values

All values which does not start with a '#' or a '"' are considered as 'normal'. They can have escaped chars which need to be evaluated. We have five kind of escaped chars :

- single chars like ",", "+", "<", ">", ";", "\" and '"' in the middle of the value
- space which are at the beginning or at the end of the value (because values are trimmed before being stored)
- '#' at the beginning of a value
- '"' at the beginning of a value 
- hex pairs like \0D

Here, we will try to parse the string in a single pass. Each decoded char will be put in a buffer if we have at least an escaped char. Otherwise, we will just get the substring of the initial value. As we will trim the value from the right side at the end, we will stroe the last escaped space if needed. The escaped flag will be used to remember that we have met an escaped char. The hexPairHighis used to remember that we have met the higher value of an hexPair, and hexpairLow to remember the lower value

lastEscapedSpace = -1
escaped = false
hexPairHigh = -1 
hexPairLow = -1
result = null

while we have char do
  if char is '\'
    then
      if escaped is true
        then
          escaped = false
          add current char to result
          continue to next char
        else
          if result = null
            then
              result = create a new buffer
              escaped = true
              continue to next char
            else
              escaped = true
              continue to next char
    else
 
done

  • No labels