INTRODUCTION

Support update and delete over Big Data.

DESCRIPTION

It supports batch updates like daily update scenarios for OLAP and Base+Delta file based design.

As the systems are not OLTP systems, the data is updated offline.

Also the data for OLAP systems are not very frequently changing data, so updates are made in batches.

Updates are :

Maintain ACID properties while updating Data:

FLOW SEQUENCE

Since the data in CarbonData files is immutable, the updates and delete are done via maintaining two files namely:

Insert Delta :

Stores newly added rows

CarbonData file format

Delete Delta :

Store RowId* of rows that are deleted

Bitmap file format

I) Update Flow Sequence

Figure 1 : Flow Sequence for Update

Update flow:

  1. Find all rows that need to be updated, by executing the subquery.

  2. Write the “Delete Delta” file

  3. Write the “Insert Delta” file

II) Delete

Figure 2 : Flow Sequence for Delete

Delete flow:

  1. Find all rows that need to be deleted, by executing the subquery.

  2. Write the “Delete Delta” file

EXAMPLE

I) Data Update :

Figure 3 : Data Updation Process

II) Data Delete :


Figure 4 : Data deletion process

READ FLOW SEQUENCE

Since the values are not physically deleted/updated so, while reading the values, the updated values are read in the following manner.

i) Update Scenario

    Read “Base” file

    Read “Delete Delta” and exclude RowId in the file

    Read “Update Delta” and merge new row

ii) Delete Scenario 

   Read “Base” file

   Read “Delete Delta” and exclude RowId in the file

  *Row ID = Segment -> block -> blocklet -> row