Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Spark Job Option
Property Required Priority Datatype Default Description Version Reference kylin.engine.spark.build-class-nameno low String org.apache.kylin.engine.spark.job.CubeBuildJobFor developer only. The className use in spark-submit. 4.0+ kylin.engine.spark.cluster-info-fetcher-class-nameyes String org.apache.kylin.cluster.YarnInfoFetcher
Fetch yarn information of spark job kylin.engine.spark-conf.XXXString Spark configurations want to override for build job like "spark.driver.cores". If don't set these spark properties, kylin will automaticly adjust these properties. Adaptively-adjust-spark-parameters kylin.storage.providerno String 不同的云厂商返回的 ContentSummary 对象不尽相同, 需要针对性地提供实现请参考 org.apache.kylin.common.storage.IStorageProvider
kylin.engine.spark.merge-class-nameno String org.apache.kylin.engine.spark.job.CubeMergeJobFor developer only. The className use in spark-submit kylin.engine.spark.task-impact-instance-enabledno Boolean Check kylin.engine.spark.task-core-factorAffect spark.executor.instances for Build Engine. 4.0+ kylin.engine.spark.task-core-factorno Integer kylin.engine.driver-memory-baseno Integer Affect spark.driver.memory for Build Engine. kylin.engine.driver-memory-strategyno kylin.engine.driver-memory-maximumno Integer kylin.engine.persist-flattable-thresholdno kylin.snapshot.parallel-build-timeout-secondsno
如果希望提升快照的构建速度的话, 可以设置这个kylin.snapshot.parallel-build-enabledno Boolean kylin.spark-conf.auto.priorno Boolean 是否需要自动设置一些 SparkConf kylin.engine.submit-hadoop-conf-dirSet HADOOP_CONF_DIR for spark-submit.kylin.storage.columnar.shard-size-mb
和 Shard 相关的一系列配置, 我暂时还不懂ylin.storage.columnar.shard-rowcountkylin.storage.columnar.shard-countdistinct-rowcountkylin.query.spark-engine.join-memory-fraction限制 广播Join使用的内存, 这个名字是不是有问题, 为啥是 query 开头