Introduction

The SearchEngine conducts search operations on a Partition using this indexed master table scheme. It requires the presence of the system indices mentioned to evaluate LDAP filters on the part of the DIT contained in such a Partition. SearchEngine is an interface with a default implementation which would even if we implemented such a partition using some other BTree implementation besides JDBM. We could use a custom in memory BTree implementation like an AvlTree as the basis for Tables for a new kind of Partition, and the DefaultSearch engine will be able to conduct search operations on it. We could even use SleepyCat's BDB JE for another disk based BTree implementation for Tables, to build a new kind of Partition based on this scheme while using the DefaultSearchEngine to search it.

Optimizer and it's default implementation, the DefaultOptimizer, annotates search filter AST's to provide optimization hints to the SearchEngine. Like the SearchEngine the optimizer implementation can operate on any Partition implementation using this indexed master table scheme containing the same required system indices.

In this document we define the search algorithm and optimization techniques applied by the DefaultSearchEngine and the DefaultOptimizer.

Search Parameters

ApacheDS provides all partitions with parameters to a search operation using a SearchOperationContext object. This object contains:

the search scope
the search base dn
the alias dereferencing mode
the search filter as an abstract syntax tree
the search time and size limits
the attribute to return for entries

Search Filter AST

Most of the search parameters are self explanatory with the exception of the search filter abstract syntax tree. For each search request ApacheDS produces an AST representation of the search filter. The branch nodes of this filter tree represent the boolean filter operators AND (&), OR (|), NOT (!). The leaf nodes of this filter represent atomic assertions in the filter. See the example filter and the AST representation for it:

*(& (| (organizationalUnitName=Sales) (OU=BOARD    of directors) ) 
    (! (localityName=sUnnYVale) ) 
    (2.5.4.0=peRSOn) )

After generating the AST for this filter, the server will normalize assertion nodes (the leaf nodes) and pass the resulting AST to the PartitionNexus. The PartitionNexus forwards this to the partition containing the search base. Here's the normalized version of this filter and it's AST:

*(& (| (2.5.4.11=sales) (2.5.4.11=board of directors) ) 
    (! (2.5.4.7=sunnyvale) ) 
    (2.5.4.0=person) )

Scope Preparation

Upon receiving the normalized search filter AST, the default search engine performs some basic scope analysis. The results of this analysis produce a special tree node called a ScopeNode. The scope node is injected into the filter AST rearranging it. A new AND node is used as the root. The old AST is inserted into the new root as a child. The scope node is also inserted as a child. To the left is the resultant modified filter AST using our example filter.

The ScopeNode embeds an additional scope constraint assertion into the filter. It enforces the scope constraint while conducting search operations. The partition adds this node now before invoking the optimizer so scope constraints can be considered for optimization. The optimizer is aware and knows how to handle this special ScopeNode containing all the need scope information or this search operation.

Alias Dereferencing

The ScopeNode does not represent the simple scope parameter for one, single and subtree. It represents the overall scope of filter applicability: the scope constrained candidate set a filter should be applied to. This factors in scope expansion due to alias dereferencing. Each alias encountered while dereferencing in any of the modes is added as a unique base to search. This represents the complete scope.

As mentioned before in the Structure and Organization page, a set of aliases are found which refer to entries expanding the scope. These aliases are dereferenced to list all the unique roots that must now be searched to correctly evaluate the filter across this expanded scope.

Optimization with Annotations

The next step in the process of preparing a filter AST is to annotate it with cost annotations. This is the task of optimizer. The default optimizer uses scan counts on indices to determine costs. Each node in the AST inlcuding branch nodes, and the scope node are has a count attribute associated with it. The scan count approach used by the default optimizer approximates a best guess on the worst case scenario. The count is a guess on how big the result set could be for entries satisfying the constraints in the filter subtree.

The count attribute should be renamed to cost since different optimizer implementations may use different tactics besides simple scan counts to optimize search. The default optimizer just uses index scan counts. This may change or be enhanced in the future.

The optimizer performs a depth first traversal of the AST annotating the children of branch nodes, before trying to annotate the branch node itself. When simple leaf nodes are entered, the optimizer checks to see if an index exists for the assertion attribute of the leaf node. If an index is available, it is used to get a quick count of the number of entries in the partition that would satisfy the asserted condition in the leaf node. This figure is set as the count annotation on the leaf node. A technique exists for quickly looking up scan counts for assertion operators (>, <, =, ~=, =*, approx, substr, scope).

Substring Assertions Are Not Cheap

ApacheDS does not use a specialized index for substring operator evaluation. Instead it walks an equality/ordering index with regular expression evaluators. This is costly since it requires pattern matching on anything but the most simplistic cases. Anything that can limit the region of the index scanned is worth using. For example if ou is indexed and the following filter is used, (ou=Eng*), then the index can natively limit our scan to only those entries starting with 'Eng' without having to use any regular expression evaluations. If instead the following filter is used, (ou=Eng*Managers), then we still limit the range natively to those index entries where ou starts with Eng, however a regular expression must be used to make sure the value ends with 'Managers'. One would think the worst case is (ou=*) but this is not the substring operator, it is the existence operator which can quickly find the scan count which is just the size of the index: no regular expression is needed. The worse of the worst cases is a substring search expression without an initial anchor (no init component) like (ou=*Managers). Such a substring filter assertion will result in a full index scan while using a regular expression on each index entry to assert whether or not the ou value ends in 'Managers'. The optimizer avoid this and takes an educated guess, otherwise it would spend too much time trying to optimize making it's function worthless.

Once all the children of a branch node have been annotated, the branch node must be annotated using the annotation results from it's children.

AND nodes are annotated with the lowest scan count of it's children. This is the absolute worst case, or maximum result set size, that could be accepted by the logic in this AND branch of the filter. This makes sense since logical AND operations compute the intersection. The most that could intersect is the smallest set.

OR nodes are annotated with the sum of the it's children's scan counts. This is the absolute worst case, or maximum result set size. Again this makes sense since the logical OR operation computes the union of the sets. The most that could be returned is the sum of all possible entries that could be returned by OR node children.

NOT nodes contain a single child. With the present indexing scheme the best we can do is subtract the scan count of the child, from the size of the complete set of entries. This works as the absolute worst case and makes sense since logical NOT operations invert the selection.

Perhaps not here but we need to make a note about how filters with NOT operations could potentially result in full index scans which are extremely costly in a very large directory (VLD). Some possible tactics may remove these problems at a cost. For example we thought about adding an inverse LUT to indices just for this purpose but that was not feasible: how do you generate all the combinations for what values an entry attribute does not have?

This process is continued until the root of the AST is reached. At this point we have a rough estimate of the worst case for the entire filter.

Search Algorithm, and Cursor Construction

The call to search() on the search engine does not synchronously search the partition to construct a result set to return. Doing so would require potentially massive buffers to store the results and increase response latency. An efficient and fast server must minimize the memory footprint of each active search request so it can process as many concurrent requests as possible.

Instead of computing the results, the search engine constructs a Cursor that can produce the resultant entries matching the filter and scope constraints. This Cursor may be a composite cursor potentially wrapping other Cursors depending on the filter. Once the Cursor is constructed and returned to be used by higher levels, it can traverse the results one by one without having to buffer the entire result set matching the filter. ApacheDS' LDAP front end can then advance forward or backwards, or even to the first or last result using this Cursor. Usually the front end will pull a candidate entry and send it out the door before getting the next entry to prevent having to buffer more than one entry at a time. Calls on the Cursor interface to position it leverage partition indices and filter assertion logic to compute the next or previous candidate. The Cursor is a critical pattern in the efficient operation of the search algorithm.

Assertion Evaluators and Candidate Cursors

There are many ways a Cursor can be built to traverse candidates that satisfying a filter. Some ways are less efficient than others, in that they will result in more computation and require more IO to evaluate the same filter with each positioning and advance. The search engine uses the scan count annotations on nodes to govern how it builds an optimal Cursor. It uses a recursive Cursor building visitor on the nodes of the AST. The visitor takes actions to create different Evaluators and Cursors based on the node type. Cursors generate candidates as IndexEntry objects that satisfying filter expressions. Evaluators evaluate expressions on candidates generated from Cursors returning IndexEntry objects.

At branch nodes, the visitor makes a decision how to build candidate generating Cursors and Evaluators.

At NOT branch nodes, the visitor creates a Cursor on the NDN index. Since NOT branch node types only possess a single child, that child is used to create a single Evaluator. The Evaluator and Cursor pair are used to build a NotCursor. On advances and positioning attempts, the NotCursor uses the NDN index Cursor to generate candidates, then checks to make sure the Evaluator returns false for that candidate. NotCursors incur a full table scan on the NDN index since they must make sure the child filter does not match a candidate to accept the candidate. There is no way to build a Cursor on the child expression and figure out the negation.

At AND branch nodes, the visitor selects a child with the smallest scan count and uses that child AST node to build a candidate generating Cursor. The siblings of the selected child are used to build a list of Evaluators. The Cursor and list of Evaluators are used to construct an AndCursor. On advances and positioning operations the Cursor finds candidates that satisfy the AND node expression by using the child Cursor to generate candidates while using the list of Evaluators to verify a match. All sibling Evaluators must match the candidate for the AndCursor to consider a candidate as having matched the AND node's filter expression.

At OR branch nodes, the visitor creates a list of Cursors for each child node. These Cursors are combined to build an OrCursor. The OrCursor walks these child Cursors one at a time. Since an OR operation takes the union of all candidates, each child Cursor's candidate is accepted by the OrCursor.

Both Enumerator building and Cursor building operations are recursive. Both stop recursing when they reach leaf nodes which either produce a simple leaf expression Evaluator or a Cursor off of a partition index associated with the leaf node's assertion attribute.

Comments on Search

Annotations only help optimize search when search expressions contain AND expressions. As noted above the scope node addition automatically makes every AST fed into the optimizer a conjunction. Plus the scope node can efficiently produce scan count information thanks to the onelevel and sublevel indices. So whatever the filter, leveraging the optimizer will help.

We mentioned that some Cursors may not be optimal because they could generate more IO and require more calculations to determine candidacy on Cursor positioning and advances. To understand this let's consider the Cursors we could build on an example filter expression:

(& (ou=engineering) (l=Sunnyvale))

For simplicity we will not consider scope in this example at first. Also for now consider there to be an index on the ou and l attributes. According to this filter expression we can build two kinds of Cursors.

The first Cursor would generate candidates from the ou index using a Cursor for ou=engineering leaf node. Each candidate returned represents an entry in the partition who's ou attribute equals the value 'engineering'. The Cursor would use the localityName (l) index to perform lookups for each candidate returned from the ou=engineering Cursor. Lookups will check to make sure an IndexEntry exists with id equal to the candidate and value equal to 'sunnyvale'.

The second Cursor would generate candidates from the localityName index for l=sunnyvale leaf node. Each candidate returned represents an entry in the partition who's localityName equals the value 'sunnyvale'. The Cursor would use the ou index to perform lookups for each candidate returned from the l=sunnyvale Cursor. Lookups will check to make sure an IndexEntry exists with id equal to the candidate and a value equal to 'engineering'

If there are 100 people in the engineering department yet 1000 people in the company are located in Sunnyvale then the first Cursor will perform 100 iterations as it walks the ou index over Tuples with a key of 'engineering'. For each candidate (of which there are 100), this Cursor will perform a single lookup on the localityName index. To satisfy this search a total of 100 Cursor advances occur with 100 lookup operations. In the case of the second Cursor, 1000 iterations are performed as it walks the localityName index over Tuples with a key of 'sunnyvale'. For each candidate (of which there are 1000), this second Cursor will perform a single lookup on the ou index. To satisfy this search a total of 1000 Cursor advances occur with 1000 lookup operations.

Choosing to build the second less efficient Cursor rather than the first Cursor costs 10 times as much in IO and processing. This is why the use of scan counts set by the optimizer are used to select the child with the smallest scan count in AND nodes. And as stated before, the addition of the scope node, always includes an AND node within the AST. Hence the optimizer should pay off in most search operations. This is why it is enabled by default in the JdbmPartition.

Composite Indices

Talk about future work regarding indices on common search filters. These indices represent filter matches.

Child pages

SearchEngine and Optimizer