Introduction

The Master table is the storage for LDAP entries. This is where we store a serialized version of each entry added by all the LDAP users.. Every other tables are indexes used to retrieve easily entries from this master table.

This table is absolutely critical if we want to rebuild the full database, if we have lost some index. However, it requires some dedicated tool to do so, as we don't store each entry's DN into the serialized entry : we only store the entry's RDN and it's parent's ID.

But basically, with only the MasterTable, plus a list of indexes to build, we can rebuild the full database.

Let's now see what's the MasterTable made of.

MasterTable Interface

Somewhere in a partition based on a hierarchical data structure, we need to store entry objects. This table will contain the entry objects with all yje attributes and values as the user provided them during add and modify operations. The server may insert additional attributes such as operational attributes into these entries, however all the original attributes provided by the user are as they were including the case of their attribute identifiers.

Such a table stores entries with a key set to some unique identifier and the value set to a serialized representation of the entry. The identifier must be unique, right now it's a long and is sequential (that means we are somehow 'limited' to 2^64 - 1 entries).

With the new Entry API, entries serialized into such a table don't contain their DN in the serialized entry. This means we have to rebuild it using the RDN stored in the entry and the parentId we also stored into the entry to return the full Entry. The rest of the information in the partition can be derived from this master table.

Hence the MasterTable interface extends the Table interface to lookup entries by numeric identifiers. Each entry is stored and manipulated as a Tuple, which is a couple of data : the key and the value, of which the keys are Long identifiers, and the values are serialized Entry objects.

Using Long as identifier has the advantage of being simple, but the problem is that we have to deal with the sequential aspect : we need to store each new created number on the disk, which leads to an extra disk access when creating an entry. Also this unique number will be local. We think that using the entry UUID instead would be a better idea.

It also implies we have a lock on the incremental counter. Adding or deleting entries are therefore costly operations.

The MasterTable also contains a means to store simple key value pair properties associated with the partition. These many be used to store configuration settings or other bits of information.

The MasterTable also exposes access to a persistent sequence counter used to generate new identifiers when adding new entries to the table (the Entry identifier, a Long).

JDBM MasterTable Implementation

The JDBM based MasterTable implementation, JdbmMasterTable, simply extends a JdbmTable and locks the Tuple keys to use Long objects. A custom Serializer is used to marshal Entry objects to and from this Table. The implementation uses a property for persisting the current value of it's sequential counter which starts at 1. There is no entry with value 0.

The properties object is another BTree contained in the master.db file. The other BTree is the primary one used to store Long identifiers mapped to serialized Entry bytes.

There's not much to the JdbmMasterTable implementation: it's pretty straight forward. However future changes may change that: see below on how if you're curious.

Potential Changes in Future: Nested Partition Identifier

In the future, the MasterTable may be altered to contain an n-bit partition identifier. This identifier would be combined with the 64-bit Long generated from the sequential counter. Instead of using the full 64-bits of the Long to generate values as the identifier, perhaps only 48-bits will be used for the maximum capacity of the MasterTable. The other 16-bits can then be used for the partition identifier. Perhaps the Long can be left bit shifted by 16, then binary OR'd with this 16-bit identifier (a Java Short). The result would be the identifier for entry.

A 16-bit (Java Short) for the partition identifier allows for a total of 65536 different partitions to be mounted in a server. A 48-bit number allows for 281,474,976,710,656 (281 trillion) entries per partition. That's a total limit of 18,446,744,073,709,551,616 entries per server which is 18.45x10^18^ which essentially is the same as the limit of a 64-bit Java Long (2^64^).

Ramifications

If the server uniquely assigns this 16-bit identifier to partitions and hence their MasterTables, then the entry identifier can now be exposed by partitions so every entry in the server can be uniquely identified regardless of the partition used to store it. The partitions then are still free to use their own scheme to generate 64-bit identifiers so long as the last 16-bits are equal to this partition id.

One problem with the present configuration is the lack of shared indices or the ability to present several indices on the same attribute across partitions as one. Effectively this implements index partitioning. We cannot do this today because of the potential for identifier overlap across partitions and indices use this potentially overlapping primary identifier to reference entries in the master table. With server unique identifiers across partitions this problem goes away.

So what's the big deal? What does this capability give us? It will allow for various enhanced features never possible before.

Aliases Will Work Across Partitions

Partitions can now expose access to a set of indices. The indices used for alias tracking may be accessed to allow search operations to be conducted across partitions while allowing for alias dereferencing.

Search Can Be Extracted And/Or Delegated

The search algorithm can be conducted outside of partitions, delegated to partitions, or combined. Effectively the search algorithm is externalized. Even if some partitions attached to the server are required to conduct search themselves, the overall search operation across partitions can be centralized. Some partitions may never need to provide search capabilities and allow the server to handle searching via their indices.

Shared Entry Cache

When search is conducted outside of a partition a shared entry cache can be effectively used. Modification timestamp indices exposed by partitions can be used to determine cache evacuations. Furthermore a shared entry cache allows the server to better manage resources. Also because identifiers can easily be associated with specific partitions a partition quota can be enforced in the shared cache to achieve the same granularity of control over caching.

Virtualization

Indices are the key to conducting joins. A virtual index could be exposed on a virtual attribute by a virtualization subsystem coupled with the search engine. These virtual attributes may be computed or extracted from external systems. This way search filters can contain virtual attributes which are properly evaluated.

The virtual subsystem may also be involved, on entry return, to modify entries after retrieval from a partition. This way, the attributes of several entries, perhaps in the same partition or across partitions may be joined to present a unified view.

The possibilities here are less explored however once search is decomposed, centralized, and indices can be accessed across partitions, most limitations are in the imagination.

  • No labels

1 Comment

  1. At some point in the future, we may want to test a MasterTable based on a HdbmTable, instead of a JdbmTable. It might speed up addition and searches.