Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
--------------------------------------------------------------------------------------------------
| master | treeInfo                              | upAttributes                                  |
--------------------------------------------------------------------------------------------------
|   0    | sequence -> 8216                      |                                               |
--------------------------------------------------------------------------------------------------
|   1    | parentId -> 0                         | objectClass0 -> top                           |
|        | upRdn -> o=sevenSeas                  | objectClass1 -> organization                  |
|        | normRdn -> 2.5.4.10=sevenseas         | o0 -> sevenSeas                               |
--------------------------------------------------------------------------------------------------
|   2    | parentId -> 1                         | objectClass0 -> top                           |
|        | upRdn -> ou=people                    | objectClass1 -> organizationalUnit            |
|        | normRdn -> 2.5.4.11=people            | ou0 -> people                                 |
--------------------------------------------------------------------------------------------------
|   3    | parentId -> 1                         | objectClass0 -> top                           |
|        | upRdn -> ou=groups                    | objectClass1 -> organizationalUnit            |
|        | normRdn -> 2.5.4.11=groups            | ou0 -> groups                                 |
--------------------------------------------------------------------------------------------------
|   6    | parentId -> 2                         | objectClass0 -> top                           |
|        | upRdn -> cn=Horatio Hornblower        | objectClass1 -> person                        |
|        | normRdn -> 2.5.4.3=horatio hornblower | objectClass2 -> organizationalPerson          |
|        |                                       | objectClass3 -> inetOrgPerson                 |
|        |                                       | cn0 -> Horatio Hornblower                     |
|        |                                       | description0 -> Capt. Horatio Hornblower, R.N |
|        |                                       | givenName0 -> Horatio                         |
|        |                                       | sn0 -> Hornblower                             |
|        |                                       | uid0 -> hhornblo                              |
|        |                                       | mail0 -> hhornblo@royalnavy.mod.uk            |
|        |                                       | userPassword0 -> <bytes>                      |
--------------------------------------------------------------------------------------------------
| ...    |                                       |                                               |
--------------------------------------------------------------------------------------------------

The row key is a sequentially generated 8-byte long. An advantage of using longs is compatibility with XDBM. A disadvantage is that sequential keys are not optimal for distributing data and load balancing across data nodes.

The 'treeInfo' column family stores hierarchical information:

  • Column 'parentId' contains the row key of the parent entry.
  • Column 'upRdn' contains the user provided local name, relative to the parent (normally the RDN; the suffix DN for the context entry). It is used to reconstruct the entry's DN.
  • Column 'normRdn' contains the normalized local name, relative to the parent (normally the RDN; DN for the context entry). This is just an optimization for constructing the key of the tree table in order to avoid RDN normalization (see below).

The 'upAttributes' column family contains a map with all the attributes.

...

  • It would be possible to store the serialized form of the server entry instead of the attributes.
  • It would be possible to store the serialized form of the RDN to avoid parsing.
  • Different kind of attributes could be stored in separate column families (e.g. binary attributes).
  • Usage of entry UUID as row key. This helps to distribute the entries over all clusters and may avoid hot spots.
  • Add oneLevelCount and subLevelCount (currently stored in tree table) for faster lookup of counts.
  • Add normAttributes (currently stored in tree table) for faster lookup of reverse index and evaluator.

Tree Table

The tree table stores parent to child relationships.

Code Block

--------------------------------------------------------------------------------------------------------------------
| tree                         | treeInfo              | normAttributes                                            |
--------------------------------------------------------------------------------------------------------------------
| 0,2.5.4.10=sevenseas         | id -> 1               | 2.5.4.0=organization -> 1                                 |
|                              | oneLevelCount -> 4    | 2.5.4.0=top -> 0                                          |
|                              | subLevelCount -> 1583 | 2.5.4.10=sevenseas -> 0                                   |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=people            | id -> 2               | 2.5.4.0=organizationalunit -> 1                           |
|                              | oneLevelCount -> 1254 | 2.5.4.0=top -> 0                                          |
|                              | subLevelCount -> 1254 | 2.5.4.11=people -> 0                                      |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=groups            | id -> 3               | 2.5.4.0=organizationalunit -> 1                           |
|                              | oneLevelCount -> 2    | 2.5.4.0=top -> 0                                          |
|                              | subLevelCount -> 56   | 2.5.4.11=groups -> 0                                      |
--------------------------------------------------------------------------------------------------------------------
| 2,2.5.4.3=horatio hornblower | id -> 6               | 0.9.2342.19200300.100.1.1=hhornblo -> 0                   |
|                              | oneLevelCount -> 0    | 0.9.2342.19200300.100.1.3=hhornblo@royalnavy.mod.uk -> 0  |
|                              | subLevelCount -> 0    | 2.5.4.0=inetorgperson -> 3                                |
|                              |                       | 2.5.4.0=organizationalperson -> 2                         |
|                              |                       | 2.5.4.0=person -> 1                                       |
|                              |                       | 2.5.4.0=top -> 0                                          |
|                              |                       | 2.5.4.13=capt. horatio hornblower, r.n -> 0               |
|                              |                       | 2.5.4.3=horatio hornblower -> 0                           |
|                              |                       | 2.5.4.35=<bytes> -> 0                                     |
|                              |                       | 2.5.4.4=hornblower -> 0                                   |
|                              |                       | 2.5.4.42=horatio -> 0                                     |
--------------------------------------------------------------------------------------------------------------------
| ...                          |                       |                                                           |
--------------------------------------------------------------------------------------------------------------------

The row key is composed of the parent entry ID (8-byte long), a comma, and the normalized RDN of an entry.

The treeInfo column family stores hierarchical information:

  • Column id contains the row key of the entry in the master table
  • Column oneLevelCount tracks the number of immediate children. It is used by the one level index. When adding or deleting an entry the oneLevelCounter of the parent entry is incremented or decremented.
  • Column subLevelCount tracks the number of all descendants. It is used by the sub level index. When adding or deleting an entry the subLevelCount counters of all parent entries are incremented or decremented.

The normAttributes column family stores a map with all attributes (indexed as well as unindexed) in normalized form. It is used for server-side filtering while scanning. The qualifier is composed of the attribute OID, an equals character, and the attribute value. The numeric values represent the 4-byte attribute index in the master table.

Using this table is easy to calculate the ID for an DN. The start row key can always be composed using the partion's suffix: 0,<suffix>. From that row key the suffix entry ID can be found in the treeInfo:id column. This ID and the next name component from the DN are used to compose the next row key. This is reapeated till name components of the DN are processed.

The table is also used for one-level and sub-level index cursors. To iterate over all children of an entry 'X' a HBase scanner with start key 'X' and stop key 'X+1' is used. For sub-level scans column column treeInfo:id can be used to setup the next scanner's start and stop key. While walking the sub-level index the column treeInfo:oneLevelCount can be used to determine if it is necessary to scan the next level.

While scanning it is also possible to use column family normAttributes for server-side filtering. This is essential for unindexed searches as it is very expensive to load all entries from the HBase cluster into ApacheDS and evaluate the filter there. Instead the LDAP filter can be translated to an HBase filter and evaluated in the HBase cluster.

The table is also used as reverse indices, e.g. by evaluators.

Alternatives and Improvements:

  • Row keys may become long if custom AT or long values are used. Even a simple RDN like '2.5.4.11=users' has 14 bytes. As the key is always calculated and never parsed back it would be possible to shorten it. Possible strategies were to use some hash (MD5 fixed 16 bytes) or to substitute the OID with the short name.
  • It isn't necessary to store objectClass:top