Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Spark Job Option
The content summary objects returned by different cloud vendors are not the same, so need to provide targeted implementation. 2500000Property Required Priority Datatype Default Description Version Reference kylin.engine.spark.build-class-nameno low String org.apache.kylin.engine.spark.job.CubeBuildJobFor developer only. The className use in spark-submit. 4.0+ kylin.engine.spark.cluster-info-fetcher-class-nameno String org.apache.kylin.cluster.YarnInfoFetcher
Fetch yarn information of spark job kylin.engine.spark-conf.XXXno String Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties before submitting build job. 4.0+ Adaptively-adjust-spark-parameters kylin.storage.providerno String org.apache.kylin.common.storage.DefaultStorageProvider
You can refer to this to learn more : org.apache.kylin.common.storage.IStorageProvider
kylin.engine.spark.merge-class-nameno String org.apache.kylin.engine.spark.job.CubeMergeJobFor developer only. The className use in spark-submit kylin.engine.spark.task-impact-instance-enabledno Boolean true Check kylin.engine.spark.task-core-factor. If kylin.engine.spark.task-impact-instance-enabled is set to true and kylin.engine.spark-conf.spark.executor.instances is not set, Kylin will calculate spark.executor.instances for Build Engine. 4.0+ Adaptively-adjust-spark-parameters kylin.engine.spark.task-core-factorno Integer 3 kylin.engine.driver-memory-baseno Integer 1024 Auto adujst spark.driver.memory for Build Engine if kylin.engine.spark-conf.spark.driver.memory is not set.
4.0+Adaptively-adjust-spark-parameters kylin.engine.driver-memory-strategyno Array {"2", "20", "100"}kylin.engine.driver-memory-maximumno Integer 4096 kylin.engine.persist-flattable-thresholdno Integer 1 If the number of cuboids which will be build from flat table is bigger than this threshold, the flat table will be persisted into $HDFS_WORKING_DIR/job_tmp/flat_table for saving more memory. 4.0+ kylin.snapshot.parallel-build-timeout-secondsno 3600
To improve the speed of snapshot build.
4.0+kylin.snapshot.parallel-build-enabledno Boolean true kylin.spark-conf.auto.priorno Boolean true If need to adjust spark parameters adaptively. 4.0+ Adaptively-adjust-spark-parameters kylin.engine.submit-hadoop-conf-dirno String /etc/hadoop/conf Set HADOOP_CONF_DIR for spark-submit.kylin.storage.columnar.shard-size-mbno Integer 128 The size of each parquet partition file of cuboid
4.0+ShardBy ylin.storage.columnar.shard-rowcountno Long The number rows of each parquet partition file of cuboid kylin.storage.columnar.shard-countdistinct-rowcountno Long 1000000 The number rows of each parquet partition file of cuboid when the shard column is distinct column. kylin.query.spark-engine.join-memory-fractionno Double 0.3 Limit memory used by broadcast join.