Motivation

Currently, when we process a query using secondary indexes, we first search secondary indexes to get matching primary keys (pks), sort them, and perform primary key lookups to fetch records. During primary index lookups, each fetch pk is checked against all components of the primary index without being filtered. When the primary index contains a large number of disk components, primary index lookups would take a lot of time. However, note the fact that all indexes of a dataset (and partition) are all flushed together, which means there is some relationship among components of secondary indexes and components of the primary index. Such relationship can be exploited to acceleration primary index lookups after secondary index scans, which is achieved through component Ids. The basic idea is that components of all indexes are correlated through component Ids. In addition to matched pks, the secondary index also return the Id of the component where the matching pk is found. Thus, when performing primary index lookups, we only need to search a subset of components of the primary index based on the input Id, which greatly reduces the time of primary index lookups, and thus the total time of query processing.

Component Id Management

Now every LSM component would have a component Id (including both memory and disk components). A component Id is represented as a interval of two timestamps. A memory component has a mutable Id, which is reset every time when the memory component is recycled. Disk components have immutable ids which are persisted in the metadata.

Memory Component

Currently we use LSMIOOperationCallback to maintain component ids, the same as maintaining component LSNs. For all indexes of a dataset partition, we have to guarantee all these memory components receive the same Id upon activation. To achieve this, we introduce the LSM component Id generator, which is shared by the dataset. The id generator supports two operations, GetId and RefreshId. GetId would always return the same Id if RefreshId is not called. However, after RefreshId is called, GetId is guaranteed to return a new Id based on the current timestamp. 

Here is the basic workflow of id management for memory components during flush

Disk Component

A disk component has the immutable Id, which is persisted in the component metadata. A disk component can be created based on the following cases:

Create Secondary Index

When we create a secondary index, entries would be bulk loaded into one disk component of the secondary index. To ensure the correctness of component Id-based acceleration, we need to flush the primary index first, otherwise the newly created disk component of the secondary index would correspond to a partial memory component of the primary index. Thus, the new disk component of the secondary index simply receive the Id as the union of Ids of all disk components of the primary index.

Load dataset

Currently we only allow an empty dataset to be loaded once. For simplicity, we simple assign loaded components with Id [0, 0].

External Dataset

Component Id acceleration is mainly designed for internal datasets. However, to be consistent in the codebase, components of external datasets always get id [0, 0].

Legacy Dataset

For legacy datasets, the component Id is missing from disk component metadata. However, in order to work with legacy datasets, we simply treat missing component Id as a special case [-1, -1]. This renders component Id acceleration useless, but we still get correct search results from indexes.

Exploit Component Id for Query Processing

To exploit component Id for query processing, for each matching pk, the secondary index outputs the id of the component where the pk is found. When intersecting pks returned from multiple secondary indexes, we select the best component Id (with smallest interval length) by performing an extra aggregation on the component Id field. Finally, during primary index lookups, we only check the pk with components having overlapping Id with the input Id. Thus, the input Id would serve the purpose of pruning unnecessary components during lookup, improving the performance of primary index lookups.