Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDDateAuthorComment
12020-11Zhichao ZhangTuning guide for 4.0.0-alpha, 
22022-5Shaofeng ShiUpdate for 4.0.1

Background

...

    Kylin 4 is a major architecture upgrade version, as the picture shows below, both the cube building engine and query engine use spark as calculation engine, and cube data is stored in parquet files instead of HBase.

...

KeyDescription
spark.memory.offHeap.enabledSet to 'true', use off-heap memory for spark shuffle e.g.
spark.memory.offHeap.sizeindicates the size of off-heap memory.

Set different configurations for each

...

query 

Currently, all queries share one Spark Session, which means that all of them share the same configurations, but each query has different scenarios and could be optimized by different configurations. Therefore, we plan to clone a thread-level SparkSession for each query to set different configurations, and then execute the query, such as configuration 'spark.sql.shuffle.partitions', set this configuration to different values according to the amount of data obtained by each query to achieve the optimal query performance. This feature is planned to be released in the 4.0 Beta version. 


Reference

...