Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
------------------------------------------------------------------------------------------------------------------------------------------------
| master                               | treeInfo                                         | upAttributes                                       |
--------------------------------------------------------------------------------------------------
|   0----------------------------------------------
| 00000000-0000-0000-0000-000000000000 |                         | sequence -> 8216                      |                                                    |
----------------------------------------------------------------------------------------------------
|   1    | parentId -> 0--------------------------------------------
| 11111111-1111-1111-1111-111111111111 | parentId -> 00000000-0000-0000-0000-000000000000 | objectClass0 -> top                                |
| objectClass0 -> top                           |
|        | upRdn -> o=sevenSeas                  | objectClass1 -> organization        | objectClass1 -> organization                       |
|                                      | normRdn -> 2.5.4.10=sevenseas                    | o0 -> sevenSeas                                    |
--------------------|                                      |                                                  | entryUUID0 -> 11111111-1111-1111-1111-111111111111 |
------------------------------------------------------------------------------
|   2    | parentId -> 1------------------------------------------------------------------
| 22222222-2222-2222-2222-222222222222 | parentId -> 11111111-1111-1111-1111-111111111111 | objectClass0 -> top                         | objectClass0 -> top     |
|                              |
|        | upRdn -> ou=people                               | objectClass1 -> organizationalUnit                 |
|                                      | normRdn -> 2.5.4.11=people                       | ou0 -> people                                 |
--------------     |
|                                      |                                                  | entryUUID0 -> 22222222-2222-2222-2222-222222222222 |
------------------------------------------------------------------------------------------------------------------------------------------------
| 33333333-3333-3333-3333-333333333333 | 3parentId ->   11111111-1111-1111-1111-111111111111 | parentIdobjectClass0 -> 1top                         | objectClass0 -> top    |
|                       |
|               | upRdn -> ou=groups                               | objectClass1 -> organizationalUnit                 |
|                                      | normRdn -> 2.5.4.11=groups                       | ou0 -> groups                                    |
----------------  |
|                                      |                                                  | entryUUID0 -> 33333333-3333-3333-3333-333333333333 |
----------------------------------------------------------------------------------
|   6    | parentId -> 2--------------------------------------------------------------
| 66666666-6666-6666-6666-666666666666 | parentId -> 22222222-2222-2222-2222-222222222222 | objectClass0 -> top                         | objectClass0 -> top    |
|                              |
|        | upRdn -> cn=Horatio Hornblower        | objectClass1 -> person        | objectClass1 -> person             |
|        | normRdn -> 2.5.4.3=horatio hornblower | objectClass2 -> organizationalPerson|
|          |
|        |                    | normRdn ->    2.5.4.3=horatio hornblower             | objectClass3objectClass2 -> inetOrgPersonorganizationalPerson                 |
|        |                              |         | cn0 -> Horatio Hornblower                     |
|        |        | objectClass3 -> inetOrgPerson                      |
|      | description0 -> Capt. Horatio Hornblower, R.N |
|        |                 |                      | givenName0 -> Horatio                         |
| cn0 -> Horatio Hornblower    |                      |
|                 | sn0 -> Hornblower                  |           |
|        |                               | description0 -> Capt. Horatio Hornblower, R.N   | uid0 -> hhornblo|
|                              |
|        |                                        | mail0 -> hhornblo@royalnavy.mod.uk       | givenName0 -> Horatio  |
|        |                    |
|                   | userPassword0 -> <bytes>                |      |
--------------------------------------------------------------------------------------------------
| ...    |                                            | sn0 -> Hornblower                                  |
|                                      | |                                                 |
--------------------------------------------------------------------------------------------------

The row key is a sequentially generated 8-byte long. An advantage of using longs is compatibility with XDBM. A disadvantage is that sequential keys are not optimal for distributing data and load balancing across data nodes.

The treeInfo column family stores hierarchical information:

  • Column parentId contains the row key of the parent entry.
  • Column upRdn contains the user provided local name, relative to the parent (normally the RDN; the suffix DN for the context entry). It is used to reconstruct the entry's DN.
  • Column normRdn contains the normalized local name, relative to the parent (normally the RDN; DN for the context entry). This is just an optimization for constructing the key of the tree table in order to avoid RDN normalization (see below).

The upAttributes column family contains a map with all the attributes.

  • The attribute description (type+options) is used as column qualifier.
  • HBase stores one value per column. There are several workarounds how to store multiple values. When using serialization or JSON format it won't be possible to access one value at a time. Hence each value is stored in its own column and an additional index is added to the column qualifier. This way each value can be read and written separately.
  • The additional index is a zero-based 4-byte signed integer. To reconstruct the user provided attribute description the last 4 bytes needs to be removed from the column qualifier.
  • The values are stored as byte[].

The row with key '0' is a special row, it's the virtual root. Its column 'treeInfo:sequence' contains the row key sequence number. HBase provides an atomic increment-and-get operation to obtain the next key for a new entry.

To retrieve the DN of an entry by its ID the entry's RDN and parent ID must be fetched. As long as the parent ID is greather than 0 this step must be repeated for all parents. The DN is the result of all concatenated RDNs.

It would be also possible to determine the ID for an DN, however a full table scan would be necessary. For this reason a second table is available.

The master table contains all information needed to restore the data: reference to the parent, user provided RDN, and user provided attributes.

Alternatives and Improvements:

  • It would be possible to store the serialized form of the server entry instead of the attributes.
  • It would be possible to store the serialized form of the RDN to avoid parsing.
  • Different kind of attributes could be stored in separate column families (e.g. binary attributes).
  • Usage of entry UUID as row key. This helps to distribute the entries over all clusters and may avoid hot spots.
  • Add oneLevelCount and subLevelCount (currently stored in tree table) for faster lookup of counts.
  • Add normAttributes (currently stored in tree table) for faster lookup of reverse index and evaluator.

Tree Table

The tree table stores parent to child relationships.

Code Block

 uid0 -> hhornblo                                   |
|                                      |                                                  | mail0 -> hhornblo@royalnavy.mod.uk                 |
|                                      |                                                  | userPassword0 -> <bytes>                           |
|                                      |                                                  | entryUUID0 -> 66666666-6666-6666-6666-666666666666 |
------------------------------------------------------------------------------------------------------------------------------------------------
| ...                                  |                                                  |                                                    |
------------------------------------------------------------------------------------------------------------------------------------------------

The row key is an UUID, the entryUUID attribute value of the entry is used. The advantage of using UUIDs is that they are random, random keys are preferred in HBase as it allows optimal distribution across data nodes. (Note: XDBM was adjusted to use an generic type parameter for the entry ID)

The treeInfo column family stores hierarchical information:

  • Column parentId contains the row key of the parent entry.
  • Column upRdn contains the user provided local name, relative to the parent (normally the RDN; the suffix DN for the context entry). It is used to reconstruct the entry's DN.
  • Column normRdn contains the normalized local name, relative to the parent (normally the RDN; DN for the context entry). This is just an optimization for constructing the key of the tree table in order to avoid RDN normalization (see below).

The upAttributes column family contains a map with all the attributes.

  • The attribute description (type+options) is used as column qualifier.
  • HBase stores one value per column. There are several workarounds how to store multiple values. When using serialization or JSON format it won't be possible to access one value at a time. Hence each value is stored in its own column and an additional index is added to the column qualifier. This way each value can be read and written separately.
  • The additional index is a zero-based 4-byte signed integer. To reconstruct the user provided attribute description the last 4 bytes needs to be removed from the column qualifier.
  • The values are stored as byte[].

The row with key '00000000-0000-0000-0000-000000000000' is the virtual root.

To retrieve the DN of an entry by its ID the entry's RDN and parent ID must be fetched. As long as the parent ID differs from '00000000-0000-0000-0000-000000000000' this step must be repeated for all parents. The DN is the result of all concatenated RDNs.

It would be also possible to determine the ID for an DN, however a full table scan would be necessary. For this reason a second table is available.

The master table contains all information needed to restore the data: reference to the parent, user provided RDN, and user provided attributes.

Alternatives and Improvements:

  • It would be possible to store the serialized form of the server entry instead of the attributes.
  • It would be possible to store the serialized form of the RDN to avoid parsing.
  • Different kind of attributes could be stored in separate column families (e.g. binary attributes).
  • Usage of entry UUID as row key. This helps to distribute the entries over all clusters and may avoid hot spots.
  • Add oneLevelCount and subLevelCount (currently stored in tree table) for faster lookup of counts.
  • Add normAttributes (currently stored in tree table) for faster lookup of reverse index and evaluator.

Tree Table

The tree table stores parent to child relationships.

Code Block

-------------------------------------------------------| tree                         | treeInfo              | normAttributes                                            |
--------------------------------------------------------------------------------------------------------------------
| 0,2.5.4.10=sevenseas         | id -> 1               | 2.5.4.0=organization -> 1                                 |
|                              | oneLevelCount -> 4    | 2.5.4.0=top -> 0                                          |
|                              | subLevelCount -> 1583 | 2.5.4.10=sevenseas -> 0                                   |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=people            | id -> 2               | 2.5.4.0=organizationalunit -> 1                           |
|                              | oneLevelCount -> 1254 | 2.5.4.0=top -> 0                                          |
|                              | subLevelCount -> 1254 | 2.5.4.11=people -> 0                                      |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=groupstree            | id -> 3               | 2.5.4.0=organizationalunit -> 1                           |
| treeInfo                              | oneLevelCount -> 2    | 2.5.4.0=top -> 0normAttributes                                           |
|                              | subLevelCount -> 56   | 2.5.4.11=groups -> 0                                      |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2,2.5.4.3=horatio hornblower | id -> 6------
| 00000000-0000-0000-0000-000000000000,2.5.4.10=sevenseas         | id -> 11111111-1111-1111-1111-111111111111 | 2.5.4.0=organization -> 1                                |
|                                                                 | oneLevelCount -> 4                         | 2.5.4.0=top -> 0                                         |
|                                                                 | subLevelCount -> 1583                      | 2.5.4.10=sevenseas -> 0                                  |
|                                                                 |                                            | 1.3.6.1.1.16.4=11111111-1111-1111-1111-111111111111 -> 0 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 11111111-1111-1111-1111-111111111111,2.5.4.11=people            | id -> 22222222-2222-2222-2222-222222222222 | 2.5.4.0=organizationalunit -> 1                          |
|                                                                 | oneLevelCount -> 1254                      | 2.5.4.0=top -> 0                                         |
|                                                                 | subLevelCount -> 1254                      | 2.5.4.11=people -> 0                                     |
|                                                                 |                                            | 1.3.6.1.1.16.4=22222222-2222-2222-2222-222222222222 -> 0 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 11111111-1111-1111-1111-111111111111,2.5.4.11=groups            | id -> 33333333-3333-3333-3333-333333333333 | 2.5.4.0=organizationalunit -> 1                          |
|                                                                 | oneLevelCount -> 2                         | 2.5.4.0=top -> 0                                         |
|                                                                 | subLevelCount -> 56                        | 2.5.4.11=groups -> 0                                     |
|                                                                 |                                            | 1.3.6.1.1.16.4=33333333-3333-3333-3333-333333333333 -> 0 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 22222222-2222-2222-2222-222222222222,2.5.4.3=horatio hornblower | id -> 66666666-6666-6666-6666-666666666666 | 0.9.2342.19200300.100.1.1=hhornblo -> 0                  |
|                                                                 | oneLevelCount -> 0                         | 0.9.2342.19200300.100.1.3=hhornblo@royalnavy.mod.uk -> 0 |
|                                                                 | subLevelCount -> 0                         | 2.5.4.0=inetorgperson -> 3                               |
|                                                                 |                                            | 2.5.4.0=organizationalperson -> 2                        |
|                                                                 |                                            | 2.5.4.0=person -> 1                                      |
|                                                                 |                                            | 0.9.2342.19200300.100.1.1=hhornblo2.5.4.0=top -> 0                           |
|              |
|                | oneLevelCount -> 0    | 0.9.2342.19200300.100.1.3=hhornblo@royalnavy.mod.uk -> 0  |
|                              | subLevelCount -> 0    | 2.5.4.0=inetorgperson -> 3                                       |
|     | 2.5.4.13=capt. horatio hornblower, r.n -> 0              |
|     |                       | 2.5.4.0=organizationalperson -> 2                         |
|         |                     |                       | 2.5.4.03=personhoratio hornblower -> 1 0                          |
|                    |
|                              |               |        | 2.5.4.0=top -> 0                                 | 2.5.4.35=<bytes> -> 0      |
|                              |
|                            | 2.5.4.13=capt. horatio hornblower, r.n -> 0                            |
|         |                     |                       | 2.5.4.34=horatio hornblower -> 0                                  |
|                                          |                       | 2.5.4.35=<bytes> -> 0                                            |
| 2.5.4.42=horatio -> 0                              |      |
|                 | 2.5.4.4=hornblower -> 0                                   |
|          |                     |                       | 2.51.3.6.1.1.16.4.42=horatio -> 0                                     |
=66666666-6666-6666-6666-666666666666 -> 0 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ...
| ...                                                             |                     |                       |                                                           |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The row key is composed of the parent entry ID (8-byte longUUID), a comma, and the normalized RDN of an entry.

...

Using this table is easy to calculate the ID for an DN. The start row key can always be composed using the partion's suffix: 000000000-0000-0000-0000-000000000000,<suffix>. From that row key the suffix entry ID can be found in the treeInfo:id column. This ID and the next name component from the DN are used to compose the next row key. This is reapeated till all name components of the DN are processed.

...

While scanning it is also possible to use column family normAttributes for server-side filtering. This is essential for unindexed searches as it is very expensive to load all entries from the HBase cluster into ApacheDS and evaluate the filter there. Instead the LDAP filter can be translated to an HBase filter and evaluated in the HBase cluster.

The table is also used as by reverse indices, e.g. by evaluators.

...