Page tree
Skip to end of metadata
Go to start of metadata


Apache CarbonData stores data in the columnar format, with each data block sorted independently with respect to each other to allow faster filtering and better compression.


Though CarbonData stores data in Columnar format, it differs from the traditional Columnar formats as the columns in each row-group(Data Block) are sorted independent of the other columns. Though this arrangement requires CarbonData to store the row-number mapping against each column value, it makes it feasible to use binary search for faster filtering and since the values are sorted, same/similar values come together which yields better compression and reduces the storage overhead required by the row number mapping for the offsets.


In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, for instance customer name, to be accessed in columns as a group, rather than individually row-by-row.         

Here is an example of a simple database table with 4 columns and 3 rows.

Table 1: Database Table with 4 columns and 3 rows


Row-oriented storage :     1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;

Column-oriented storage :  1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

One of the main benefits of a columnar database is that data can be highly compressed. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly.  Another benefit is that because a column-based storage is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.


Apache CarbonData file contains groups of data called blocklet, along with all required information like schema, offsets and indices, etc, in a file footer.

The file footer can be read once to build the indices in memory, which then can be utilised for optimising the scans and processing of all the subsequent queries.

Each blocklet in the file is further divided into chunks of data called Data Chunks. Each data chunk is organised either in a columnar format or a row format, and stores the data of either in a single column or a set of columns. All blocklets in one file contain the same number and type of Data Chunks.

Figure 1 : CarbonData File 


Figure 2 : Detailed Description of CarbonData File Format

I) File Header :Contains information about

  • CarbonData file version number

  • List of column schema

  • Schema updation timestamp

II) Blocklet : A set of rows in columnar format

  • Balance between efficient scan and compression

  • Data are sorted along MDK (multi-dimensional keys)

  • Default blocklet size: 64MB (but the size is configurable)

  • Minimum size for predicate filtering

  • Large size for efficient reading and compression


sorted_mdk_cd sorted_data_cd


Figure 3 : Pictorial representation of Columnar encoding 

Further the Blocklet contains Column Page groups for each column, also known as Column chunks.

The Column chunk is data for one column in a Blocklet.

  • Column data can be stored as sorted index

  • It is guaranteed to be contiguous in file

  • Allow multiple columns form a column group 

  • stored as a single column chunk in row-based format

  • suitable to set of columns frequently fetched together

  • saving stitching cost for reconstructing row

Each Data Chunk contains multiple groups of data called as Pages.

Page has the data of one column and the number of row is fixed to 32000 size. There are three types of pages.

  • Data Page: Contains the encoded data of a column/group of columns.

  • Row ID Page (optional): Contains the row id mappings used when the Data Page is stored as an inverted index.

  • suitable to low cardinality column

  • better compression & fast predicate filtering


Figure 4: Representation of Sort Columns within Column Chunks 

The inverted index tells the actual position of the column value in the column(i.e, the row number).

Example: value ‘1’ in the “column 2” is present in rows 1-8, so rest of the rows need not to be considered and hence allows fast filtering.

Also the inverted index stores the values in a sorted order and hence using binary search will effectively improve the searching time for the filter value.

It’ll also help to reconstruct the row, as the data has columnar storage, and the values might jumbled up during sorting and storing them column wise.

  • RLE Page (optional): Contains additional metadata used when the Data Page is RLE coded.


Figure 5: Run Length Encoding

III) Footer : Metadata information

  • File level metadata (Number of rows, segmentinfo ,list of blocklets info and index) & statistics

  • Schema

  • Blocklet Index & Metadata


Figure 6 : CarbonData File Footer 
  • No labels