Apache Kylin : Analytical Data Warehouse for Big Data
Page History
...
Spark resources automatic adjustment strategy (experimental feature)
| Property | Default | Description | Since | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
kylin.spark-conf.auto.prior | true | For a CubeBuildJob and CubeMergeJob, it is important to allocate enough and proper resources(cpu/memory), including following config entries mainly:
When `kylin.spark-conf.auto.prior` is set to true, Kylin will try to adjust above config entries according to:
But user still can choose to override some config via in the form of `kylin.engine.spark-conf.` in <key> = <value>` at the Cube level. The parameter value configured by the user will overwrite the parameter value of automatic parameter adjustment. Check detail at How to improve cube building and query performance | 4.0.0 | |||||||||
kylin.engine.spark-conf.spark.master | yarn | The cluster manager to connect to. Kylin support set it to yarn/local/standalone. | ||||||||||
kylin.engine.spark-conf.spark.submit.deployMode | client | The deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") or remotely ("cluster") on one of the nodes inside the cluster. | ||||||||||
kylin.engine.spark-conf.spark.yarn.queue | default | |||||||||||
kylin.engine.spark-conf.spark.shuffle.service.enabled | false | Enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed. The external shuffle service must be set up in order to enable it. | 4.0.0 | |||||||||
| kylin.engine.spark-conf.spark.eventLog.enabled | true | Whether to log Spark events, useful for reconstructing the Web UI after the application has finished. | ||||||||||
kylin.engine.spark-conf.spark.eventLog.dir | hdfs\:///kylin/spark-history | Base directory in which Spark events are logged, if spark.eventLog.enabled is true. | ||||||||||
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled | false | |||||||||||
kylin.engine.spark-conf.spark.executor.extraJavaOptions |
| |||||||||||
kylin.engine.spark-conf.spark.yarn.jars | hdfs://localhost:9000/spark2_jars/* | Manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime | null | User can choose to set spark conf of Cube/Merge Job at Cube level. | 4.0.0 | |||||||
kylin.engine.driver-memory-base | 1024 | Driver memory(spark.driver.memory) is auto adjusted by cuboid count and configuration. kylin.engine.driver-memory-strategy will decided some level. For example, "2,20,100" will transfer to four cuboid count ranges, from low to high, as following:
So, we can find a proper level for specific cuboid count. 12 will be level 2, and 230 will be level 4. Driver memory will be calculated by following formula :
| 4.0.0 | |||||||||
kylin.engine.driver-memory-maximum | 4096 | See above. | 4.0.0 | |||||||||
kylin.engine.driver-memory-strategy | 2,20,100 | See above. | 4.0.0 | |||||||||
kylin.engine.base-executor-instance | 5 | 4.0.0 | ||||||||||
kylin.engine.spark.required-cores | 1 | 4.0.0 | ||||||||||
kylin.engine.executor-instance-strategy | 100,2,500,3,1000,4 | 4.0.0 | ||||||||||
kylin.engine.retry-memory-gradient | 4.0.0 |
...
Following files are under WORKING-DIR/$PROJECT/job_tmp/${JOB_ID}/share, produced in the first step of BuildJob. And they served to spark resources automatic adjustment strategy. (Source code : ResourceDetectBeforeCubingJob).
| Resource Detect File | Data Type | Format | Description | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| count_distinct.json | Boolean | Binary | Cube contains COUNT_DISTINCT(bitmap) measure. Sample : true | |||||||
| ${JOB_ID}_resource_path.json | Map<String, List<String>> | Binary | Key is cuboid ID, and value is cuboid's parent dataset's partition path. -1 means Flat Table. Sample :
| |||||||
| ${JOB_ID}_cubing_detect_items.json | Map<String, Integer> | Binary | Key is cuboid ID, and value is cuboid's parent dataset's partition count. Sample :
|
Global dictionary
| Property | Default | Description | Since |
|---|---|---|---|
kylin.dictionary.detect-data-skew-sample-enabled | |||
kylin.dictionary.detect-data-skew-sample-rate | |||
kylin.dictionary.detect-data-skew-percentage-threshold |
...