Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Background

From Kylin 4.0.0, Kylin will provided two binary which verified on different Hadoop env.  We choose some popular Hadoop distribution, such as Cloudera, HDP, AWS EMR.

Besides, we also include a custom Hadoop installation combination. For user who prefer a custom Hadoop combination, this may be helpful to you.

On each Hadoop platform/env we tested, we do NOT use the spark provided by env(HDP, CDH or AWS EMR), but download specific version of Apache Spark.

Kylin 4.0.0 Support Matrix


Kylin BinaryHadoop DistributionSparkHadoopHiveCluster Manager

Distributed Filesystem

Verified ?Comment
Kylin 4.0.0-spark2CDH 5.72.4.72.6.0-cdh5.7.61.1.0-cdh5.7.6YARNHDFS
  • verified

Kylin 4.0.0-spark2HDP 2.42.4.72.7.1.2.4.0.0-161.2.1000.2.4.0.0-16YARNHDFS
  • verified

Kylin 4.0.0-spark2AWS EMR 5.33.02.4.7

2.10.1-amzn-1

Hive 2.3.7-amzn-4

YARNHDFS/S3
  • verified
Deploy Kylin 4 on AWS EMR
Kylin 4.0.0-spark2CDH 6.2.02.4.73.0.0-cdh6.2.02.1.1-cdh6.2.0YARNHDFS
  • verified
Deploy Kylin 4 on CDH 6
Kylin 4.0.0-spark3AWS EMR 6.3.03.1.1

3.2.1-amzn-3

3.1.2-amzn-4YARNHDFS/S3
  • verified
Deploy Kylin 4 on AWS EMR
Kylin 4.0.0-spark3CDH 6.2.03.1.13.0.0-cdh6.2.02.1.1-cdh6.2.0YARNHDFS
  • verified
Deploy Kylin 4 on CDH 6
Kylin 4.0.0-spark3Apache3.1.13.2.02.3.9YARN, StandaloneS3
  • verified
http://kylin.apache.org/docs40/install/deploy_without_hadoop.html


Note:

  1. Object storage such as S3 are not well tested, and is tagged as experimental feature, and performance is not good as HDFS. So it is not recommend in production env without a storage cache layer (such as Alluxio).
  2. When using Standalone as cluster manager, Kylin 4.0.0 only support client as deployMode .
  3. Please configure proper kylin.engine.spark-conf.spark.sql.hive.metastore.version, kylin.engine.spark-conf.spark.sql.hive.metastore.jars, kylin.engine.spark-conf.spark.sql.hive.metastore.versionkylin.query.spark-conf.spark.sql.hive.metastore.jars; please check http://spark.apache.org/docs/latest/configuration.html or http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html for detail (spark connect to hive).
  4. In some Hadoop platform or custom Hadoop(version) combination, you may still face some class conflict issue. Some of them are related with hive lib/jars. Please report them to user mailing list to find a solution.
  5. In Hadoop 3.X env, you may find Kylin didn't print logger output into 'kylin.log', and only part of them exists in 'kylin.out'. This is usually caused by Slf4j did't work as expected, I suggested you to copy 'log4j-1.2.17.jar' and 'slf4j-log4j12-1.7.25.jar' (these two jars maybe found under $SPARK_HOME/jars) into $KYLIN_HOME/ext and restart Kylin instance. You can found some output like 'SLF4J: Class path contains multiple SLF4J bindings.' in 'kylin.out'.
  6. Class conflict may happen in some Hadoop Platform we didn't tested, some user has reported them, here are some related issues : KYLIN-5073 - Getting issue details... STATUS  and  KYLIN-5069 - Getting issue details... STATUS  . If you faced these troubles, please try to check comment under these issues before open a JIRA issue.
  7. If you faced "java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(Lorg/apache/hadoop/hive/conf/HiveConf;)V " when you are using Hive 3.X, please try to check this issue : KYLIN-5084 - Getting issue details... STATUS   for solution.

Kylin 4.0.1 Support Matrix

Kylin BinaryHadoop DistributionSparkHadoopHiveCluster Manager

Distributed Filesystem

Verified ?Comment
Kylin 4.0.0-spark2CDH 5.72.4.72.6.0-cdh5.7.61.1.0-cdh5.7.6YARNHDFS
  • verified

Kylin 4.0.0-spark2HDP 2.42.4.72.7.1.2.4.0.0-161.2.1000.2.4.0.0-16YARNHDFS
  • verified

Kylin 4.0.0-spark2AWS EMR 5.33.02.4.7

2.10.1-amzn-1

Hive 2.3.7-amzn-4

YARNHDFS/S3
  • verified
Deploy Kylin 4 on AWS EMR
Kylin 4.0.0-spark2CDH 6.2.02.4.73.0.0-cdh6.2.02.1.1-cdh6.2.0YARNHDFS
  • verified
Deploy Kylin 4 on CDH 6
Kylin 4.0.0-spark3AWS EMR 6.3.03.1.1

3.2.1-amzn-3

3.1.2-amzn-4YARNHDFS/S3
  • verified
Deploy Kylin 4 on AWS EMR
Kylin 4.0.0-spark3CDH 6.2.03.1.13.0.0-cdh6.2.02.1.1-cdh6.2.0YARNHDFS
  • verified
Deploy Kylin 4 on CDH 6
Kylin 4.0.0-spark3Apache3.1.13.2.02.3.9YARN, StandaloneS3
  • verified
http://kylin.apache.org/docs40/install/deploy_without_hadoop.html
  • No labels