Page tree
Skip to end of metadata
Go to start of metadata

INTRODUCTION

Apache CarbonData has Rich Multi-Level Index Support.

DESCRIPTION

Apache CarbonData uses multiple indexes at various levels to enable faster search and query processing.

Using indexes, we can efficiently find the position of the data that is  required while skipping the parts of data that are not required (need not be processed) and hence results in faster query processing.

Storing data along with index significantly accelerates query performance and reduces the I/O scans and CPU resources in case of filters in the query. CarbonData index consists of multiple levels of indices, a processing framework can leverage this index to reduce the number of tasks it needs to schedule and process. It can also do skip scan in more fine grained units (called blocklet) in task side scanning instead of scanning the whole file.

To get data using indexing, the steps followed are :

  1. File Pruning.

  2. Blocklet Pruning.

  3. Binary search using Inverted Index.

TYPES OF INDEXES

I) Index stored in file footer(enables two level of B+ tree indexing):

  1. Table level index: global B+ tree, efficient file level filtering.

Searching for the file, using the table level index.

These files will be further used, to get the row-groups(Data Blocks) using the file level index.

Figure 1 : Table Level Indexing

       2. File level index: local B+ tree,  efficient blocklet level filtering

data_index_cd.jpg

Figure 2 : File Level Indexing

Global Multi Dimensional Keys(MDK) based B+Tree Index for all non- measure columns aids in quickly locating the row groups(Data Blocks) that contain the data matching search/filter criteria.

blocklet_image.jpg

Figure 3 :  Blocklet Level Indexing

Min-Max Index for all columns aids in quickly locating the row groups(Data Blocks) that contain the data matching search/filter criteria.

Figure 4 : Data Blocks

II) Column level index: inverted index used for efficient column chunk scan

Figure 5 : Data contains Column Level Indexes

Data Block level Inverted Index for all columns aids in quickly locating the rows that contain the data matching search/filter criteria within a row group(Data Blocks).

encoding_cd

Figure 6 : RLE to allow Data Block Level Indexes
  • No labels