Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

1、Background

Starting with kylin 2.3.0, users can use the cube planner to prune and optimize the cube. Cube planner consists of two stages. Stage 1 can recommend the list of cuboids based on the estimated cuboid size before building the first segment, which will only reduce the number of cuboids in this stage; Stage 2 is to recommend the optimized cuboid list for the existing cube according to the statistics of SQL executed by users. This stage not only reduces cuboid but also increases cuboid.

In kylin 4.0, due to the implementation of the new spark build engine and query engine, Cube Planner is not supported in kylin4.0.0-alpha for the time being, but the first stage of Cube Planner is supported in kylin 4.0.0-beta. Please refer to the document: How to use Cube Planner in Kylin 4.

For the second stage of Cube Planner, it is divided into two steps. The first step is to obtain the optimized recommended cuboid list according to the collected statistics; the second step is based on recommend cuboids list to update the historical segments in the cube so that the cuboids in the historical segments are consistent with the the recommend cuboid list.

The job generated to update the cube list of historical segments is called Optimize Cube Job. Each historical segment will correspond to an Optimize Cube Job. This job will not rebuild all the cuboids in the original segment, it just starts from the existing cuboids  data to build new cuboids, remove the cuboids that need to be deleted, reuse the previous global dictionary and lookup table snapshot, and get the updated segment; After all Optimize Cube Job are completed, an Optimize Checkpoint Job will be used to uniformly update the cube metadata and clean up garbage. The completion of Optimize Checkpoint Job marks the completion of the whole work of updating cuboid list. In this process, all query tasks of users will not be affected.

2、How to update cuboid list for a cube

Due to some limitations, kylin 4.0 can not support the first step of the second stage of cube planner, which is to automatically recommend recommend cube list; However, in order to enable users to adjust the cube more flexibly according to business scenarios, kylin 4.0 supports the second step of the second phase of the cube planner provides the ability to allow users to manually adjust the cube list. If you want to delete or add the cuboids for the specified cube, you can update the cuboid list by calling rest API(http://host:port/kylin/api/cubes/{CubeName}/optimize2) . After updating the cube list, kylin 4.0 will generate the corresponding Optimize Cube Job for each historical segment in the cube to update the cuboid list. Finally, an Optimize Checkpoint Job is used to uniformly update the cube metadata and clean up the garbage.

step1、Calculate the cuboid id you want to add/delete according to the dimension

Cuboid Id is determined according to the order of dimension in Rowkey. For example, for the following rowkey columns:

The binary value of the corresponding base cuboid id is that the corresponding position of each rowkey column is 1, that is, 111111111, and the decimal cuboid id is   262143。

When you want to build a cuboid of PART_DT and BUYER_ID, then the corresponding positions of PART_DT and BUYER_ID are 1, other positions are 0 in Rowkey column, binary cuboid id is 1001000000000000000, and decimal cuboid ID is   147456。

step2、Call the REST API to pass in the cuboid id you want to add/delete

REST API
REST API:PUT http://host:port/kylin/api/cubes/{CubeName}/optimize2
Request Body:
{
"cuboidsAdd":["cuboidId1","cuboidId2"],
"cuboidsDelete":["cuboidId3","cuboidId4"]
}

After passing in the cuboid id to be added/deleted through the REST API, kylin 4.0 will generate the corresponding recommend cuboid list, optimize segment job and optimize checkpoint job, as shown in the following figure:

Because the cube in the example has two built segments, so two optimize cube jobs and one optimize checkpoint job are generated.

OPTIMIZE CUBE JOB:

OPTIMIZE CHECKPOINT JOB:

After the optimize checkpoint job is completed, the cuboid list update is completed.

  • No labels