We already covered some of the basic structural aspects in a 2 column Table based partition design in the other sections. It should be clear how indices store Long pointers into the MasterTable to access entries with specific attribute values. Indices prevent the need for full MasterTable scans which would entail a massive volume of expensive lookup, deserialization, normalization and comparison operations.
This document will cover higher level details such as the default system indices used for common operations and how these indices along with optional user indices work together to efficiently search Table based partitions using LDAP filters. Namespace and hierarchy maintenance aspects are also covered.
|
Digram needs to be updated to reflect the changes done in StoreUtils.java |
The existence index maps the OID of only indexed attributeTypes, to the identifiers in the master for entries containing one or more values for the attribute. All attributeTypes are not tracked. When an index is created on ou it will contain IndexEntry objects for 'ou' in this existence index. Note the OID for attributeTypes is the normalized form for attribute type identifiers.
Remember for our example DIT's partition we indexed objectClass, ou, and cn. |
The existence index is used to conduct searches using the existence operator =*. For example a search with the following filter, (cn=*), returns entries 5, 6, and 7. The search engine acquires a Cursor and positions it just before the key '2.5.4.11' which is the unambiguous representation of 'cn' or 'commonName'. It then walks (advances one step at a time) the IndexEntry objects for this key, and stops the walk as soon as this key's values are exhausted. At each advance, the entry identifier is used to lookup the entry in the master table and return it.
Notice the existence index does not add the OID for the objectClass attribute. This is specifically avoided since all entries contain the objectClass attribute. There is no reason for this index to bloat by including it.
I 'think' the objectClass attribute is not considered for existence indices however this may not be the case. We need to determine this and make sure we do NOT add objectClass IndexEntrys to the existence index. |
|
Alias entries cause cycles in a DIT partition which would otherwise be a perfect tree with the context entry at the root. LDAP search operations are conducted using one of four alias dereferencing modes:
Aliases complicate the mechanics of conducting search operations. Searches, without alias dereferencing enabled, contain a clear candidate range for which filter evaluation is applied. Without dereferencing the candidate range is based on the search scope parameter and the search base. Alias dereferencing expands the candidate range into areas of the DIT that would not have been normally eligible for filter evaluation. Candidate determination with dereferencing must factor in the presence of aliases.
Three specialized system indices are used to manage the 3 modes above which allow for some kind of alias dereferencing. These system indices are called the alias, oneAlias, and subAlias indices. We describe what each index contains and how it is used in the subsections below.
|
Digram needs to be updated to reflect the changes done in StoreUtils.java |
|
|
Digram needs to be updated with the correct key-value ids |
|
Digram needs to be updated with the correct key-value ids |
Object level search requests are trivially handled. Only two of the dereferencing modes matter to us in this case.
Mode |
Considered |
Reason |
---|---|---|
neverDerefAliases |
no |
this is a normal search and dereferencing does not matter |
derefFindingBase |
yes |
if the base is an alias we must dereference it before filter evaluation |
derefSearching |
no |
with object level search we're not searching |
derefAlways |
yes |
same as derefFindingBase |
So if alias dereferencing is in effect always or when finding the base a check must be performed to see if the search base entry is an alias. If it is an alias, the referenced entry is resolved, and the search base is set to the referenced entry. In this simple case with object level scope, alias dereferencing does not expand the single candidate. Instead alias dereferencing switches the base to the referenced entry.
One level search requests apply the same technique of switching the base when base dereferencing is enabled. Actually all search scopes switch the search base when base dereferencing is enabled, and the original search base is in fact an alias.
The search engine uses this index to find aliases which expand the range of candidates to entries outside the scope. After the base entry is resolved, the search engine, uses the identifier of the base to lookup all alias children. The referred to entries are then included in search filter evaluation and returned if accepted by the filter. This way the candidate range expands to include referenced entries in the aliasedObjectName attribute.
When alias dereferencing while searching is enabled, ScopeNodes, in the filter AST, should not be used as candidate enumerators. In fact the search engine's optimizer should annotate the scope node with a maximum count value when dereferencing while searching is enabled. I do not know if this is the case today. |
For our example, let's do a search where the search base is entry with id 4. The scope is one level and derefSearching is enabled. The search engine positions a Cursor over the children of entry 4 using the onelevel hierarchy index. It joins the results of this Cursor with the results from another Cursor on the oneAlias index over forward LUT key 4. This includes entries 6, 7 and 9 as candidates. Entry 9 however is filtered out of the result set when derefSearching is enabled. It is not returned since it is an alias.
Dereferencing while searching with subtree scope is rather painful. After dereferencing the base, if needed and derefFindingBase is enabled, the search engine looks up all alias descendants in the subAlias index. Alias descendants are then collected, and put into a special ScopeAssertion which uses this list and search scope parameters to determine if an entry is an eligible candidate for filter evaluation.
The semantics of search with derefSearching enabled presumes search propagation into dereferenced subtrees. This means a propagated search may encounter new alias entries which it must dereference as well.
You'll see in more advanced sections of this guide that assertion and candidate enumerating Cursors are used by the search engine. One assertion is the scope assertion. For search without alias dereferencing while searching, this scope assertion just makes sure the candidate entries are in scope by checking if they are subordinate to (with on level search) or descendants of the search base (with subtree level search).
When derefSearching is enabled, the search engine modifies how this scope assertion determines if a candidate is within scope. The algorithm is simple and it uses the subAlias index with subtree scope search:
By modifying the behavior of the scope assertion to factor in scope expanding aliases in the scope constraint check we effectively appear to propagate search through aliases.
Sometimes aliases will be return in this set of aliases to consider in scope assertions, which are descendants of other aliases already in this set. With subtree searching the search will automatically hit these descendant aliases so there's no need to have them in the set. Hence we can prune this set, or perform checks before insertion to prevent including redundant aliases in this set. |
We will cover more of this while looking closer at the search engine in different sections of this documentation.
ApacheDS places some limitations on Aliases. The limitations below are imposed either for the sake of simplicity in conducting search or due to structural limitations.
The partition can be asked to maintain an index on any attribute defined in the servers schema. These indices contain Tuples in their forward LUT, mapping a value of the attribute, to the id of the entry with that attribute value. This way after consulting the index, entries with specific values for this attribute can be quickly looked up by identifier in the master table.
For the time being, the objectClass index is considered a user index, however the JdbmPartition still mandates it's creation whether or not it is configured for the partition. This is purposefully done since objectClass values are critical for server operation and used heavily. Without setting this index performance would be terrible.
In the future this index may be converted to a system index. I think it should.
The function of several optional ApacheDS features may be greatly enhanced by the presence of indices on certain attributes. One example is the revision attribute which has not yet been defined, but it will. This attribute will be used by the change log facility to track the last altering revision on an entry. When this system and it's related snapshoting capabilities are enabled, having an index on this attribute will be very valuable.
Other similar examples exist. You'll find for different usecases, and for different kinds of canned search distributes, indices on some attributes will greatly improve search performance. More about this is covered on the search engine section of this guide.