Apache CarbonData stores data in the columnar format, with each data block sorted independently with respect to each other to allow faster filtering and better compression.
Though CarbonData stores data in Columnar format, it differs from the traditional Columnar formats as the columns in each row-group(Data Block) are sorted independent of the other columns. Though this arrangement requires CarbonData to store the row-number mapping against each column value, it makes it feasible to use binary search for faster filtering and since the values are sorted, same/similar values come together which yields better compression and reduces the storage overhead required by the row number mapping for the offsets.
BRIEF INTRO ABOUT COLUMNAR STORAGE
In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, for instance customer name, to be accessed in columns as a group, rather than individually row-by-row.
Here is an example of a simple database table with 4 columns and 3 rows.
Table 1: Database Table with 4 columns and 3 rows
Row-oriented storage : 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
Column-oriented storage : 1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly. Another benefit is that because a column-based storage is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.
CARBONDATA FILE FORMAT
Apache CarbonData file contains groups of data called blocklet, along with all required information like schema, offsets and indices, etc, in a file footer.
The file footer can be read once to build the indices in memory, which then can be utilised for optimising the scans and processing of all the subsequent queries.
Each blocklet in the file is further divided into chunks of data called Data Chunks. Each data chunk is organised either in a columnar format or a row format, and stores the data of either in a single column or a set of columns. All blocklets in one file contain the same number and type of Data Chunks.
Figure 1 : CarbonData File
Figure 2 : Detailed Description of CarbonData File Format
I) File Header :Contains information about
CarbonData file version number
List of column schema
Schema updation timestamp
II) Blocklet : A set of rows in columnar format
Balance between efficient scan and compression
Data are sorted along MDK (multi-dimensional keys)
Default blocklet size: 64MB (but the size is configurable)
Minimum size for predicate filtering
Large size for efficient reading and compression
Figure 3 : Pictorial representation of Columnar encoding
Further the Blocklet contains Column Page groups for each column, also known as Column chunks.
The Column chunk is data for one column in a Blocklet.
Column data can be stored as sorted index
It is guaranteed to be contiguous in file
Allow multiple columns form a column group
stored as a single column chunk in row-based format
suitable to set of columns frequently fetched together
saving stitching cost for reconstructing row
Each Data Chunk contains multiple groups of data called as Pages.
Page has the data of one column and the number of row is fixed to 32000 size. There are three types of pages.
Data Page: Contains the encoded data of a column/group of columns.
Row ID Page (optional): Contains the row id mappings used when the Data Page is stored as an inverted index.
suitable to low cardinality column
better compression & fast predicate filtering
Figure 4: Representation of Sort Columns within Column Chunks
The inverted index tells the actual position of the column value in the column(i.e, the row number).
Example: value ‘1’ in the “column 2” is present in rows 1-8, so rest of the rows need not to be considered and hence allows fast filtering.
Also the inverted index stores the values in a sorted order and hence using binary search will effectively improve the searching time for the filter value.
It’ll also help to reconstruct the row, as the data has columnar storage, and the values might jumbled up during sorting and storing them column wise.
RLE Page (optional): Contains additional metadata used when the Data Page is RLE coded.
Figure 5: Run Length Encoding
III) Footer : Metadata information
File level metadata (Number of rows, segmentinfo ,list of blocklets info and index) & statistics
Blocklet Index & Metadata