Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
Background
From Kylin 4.0.0, Kylin will provided two binary which verified on different Hadoop env. We choose some popular Hadoop distribution, such as Cloudera, HDP, AWS EMR.
Besides, we also include a custom Hadoop installation combination. For user who prefer a custom Hadoop combination, this may be helpful to you.
On each Hadoop platform/env we tested, we do NOT use the spark provided by env(HDP, CDH or AWS EMR), but download specific version of Apache Spark.
Kylin 4.0.0 Support Matrix
Kylin Binary | Hadoop Distribution | Spark | Hadoop | Hive | Cluster Manager | Distributed Filesystem | Verified ? | Comment |
---|---|---|---|---|---|---|---|---|
Kylin 4.0.0-spark2 | CDH 5.7 | 2.4.7 | 2.6.0-cdh5.7.6 | 1.1.0-cdh5.7.6 | YARN | HDFS |
| |
Kylin 4.0.0-spark2 | HDP 2.4 | 2.4.7 | 2.7.1.2.4.0.0-16 | 1.2.1000.2.4.0.0-16 | YARN | HDFS |
| |
Kylin 4.0.0-spark2 | AWS EMR 5.33.0 | 2.4.7 | 2.10.1-amzn-1 | Hive 2.3.7-amzn-4 | YARN | HDFS/S3 |
| Deploy Kylin 4 on AWS EMR |
Kylin 4.0.0-spark2 | CDH 6.2.0 | 2.4.7 | 3.0.0-cdh6.2.0 | 2.1.1-cdh6.2.0 | YARN | HDFS |
| Deploy Kylin 4 on CDH 6 |
Kylin 4.0.0-spark3 | AWS EMR 6.3.0 | 3.1.1 | 3.2.1-amzn-3 | 3.1.2-amzn-4 | YARN | HDFS/S3 |
| Deploy Kylin 4 on AWS EMR |
Kylin 4.0.0-spark3 | CDH 6.2.0 | 3.1.1 | 3.0.0-cdh6.2.0 | 2.1.1-cdh6.2.0 | YARN | HDFS |
| Deploy Kylin 4 on CDH 6 |
Kylin 4.0.0-spark3 | Apache | 3.1.1 | 3.2.0 | 2.3.9 | YARN, Standalone | S3 |
| http://kylin.apache.org/docs40/install/deploy_without_hadoop.html |
Note:
- Object storage such as S3 are not well tested, and is tagged as experimental feature, and performance is not good as HDFS. So it is not recommend in production env without a storage cache layer (such as Alluxio).
- When using Standalone as cluster manager, Kylin 4.0.0 only support client as deployMode .
- Please configure proper kylin.engine.spark-conf.spark.sql.hive.metastore.version, kylin.engine.spark-conf.spark.sql.hive.metastore.jars, kylin.engine.spark-conf.spark.sql.hive.metastore.version, kylin.query.spark-conf.spark.sql.hive.metastore.jars; please check http://spark.apache.org/docs/latest/configuration.html or http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html for detail (spark connect to hive).
- In some Hadoop platform or custom Hadoop(version) combination, you may still face some class conflict issue. Some of them are related with hive lib/jars. Please report them to user mailing list to find a solution.
- In Hadoop 3.X env, you may find Kylin didn't print logger output into 'kylin.log', and only part of them exists in 'kylin.out'. This is usually caused by Slf4j did't work as expected, I suggested you to copy 'log4j-1.2.17.jar' and 'slf4j-log4j12-1.7.25.jar' (these two jars maybe found under $SPARK_HOME/jars) into $KYLIN_HOME/ext and restart Kylin instance. You can found some output like 'SLF4J: Class path contains multiple SLF4J bindings.' in 'kylin.out'.
- Class conflict may happen in some Hadoop Platform we didn't tested, some user has reported them, here are some related issues : - KYLIN-5073Getting issue details... STATUS and - KYLIN-5069Getting issue details... STATUS . If you faced these troubles, please try to check comment under these issues before open a JIRA issue.
- If you faced "java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(Lorg/apache/hadoop/hive/conf/HiveConf;)V " when you are using Hive 3.X, please try to check this issue : - KYLIN-5084Getting issue details... STATUS for solution.
Kylin 4.0.1 Support Matrix
Kylin Binary | Hadoop Distribution | Spark | Hadoop | Hive | Cluster Manager | Distributed Filesystem | Verified ? | Comment |
---|---|---|---|---|---|---|---|---|
Kylin 4.0.0-spark2 | CDH 5.7 | 2.4.7 | 2.6.0-cdh5.7.6 | 1.1.0-cdh5.7.6 | YARN | HDFS |
| |
Kylin 4.0.0-spark2 | HDP 2.4 | 2.4.7 | 2.7.1.2.4.0.0-16 | 1.2.1000.2.4.0.0-16 | YARN | HDFS |
| |
Kylin 4.0.0-spark2 | AWS EMR 5.33.0 | 2.4.7 | 2.10.1-amzn-1 | Hive 2.3.7-amzn-4 | YARN | HDFS/S3 |
| Deploy Kylin 4 on AWS EMR |
Kylin 4.0.0-spark2 | CDH 6.2.0 | 2.4.7 | 3.0.0-cdh6.2.0 | 2.1.1-cdh6.2.0 | YARN | HDFS |
| Deploy Kylin 4 on CDH 6 |
Kylin 4.0.0-spark3 | AWS EMR 6.3.0 | 3.1.1 | 3.2.1-amzn-3 | 3.1.2-amzn-4 | YARN | HDFS/S3 |
| Deploy Kylin 4 on AWS EMR |
Kylin 4.0.0-spark3 | CDH 6.2.0 | 3.1.1 | 3.0.0-cdh6.2.0 | 2.1.1-cdh6.2.0 | YARN | HDFS |
| Deploy Kylin 4 on CDH 6 |
Kylin 4.0.0-spark3 | Apache | 3.1.1 | 3.2.0 | 2.3.9 | YARN, Standalone | S3 |
| http://kylin.apache.org/docs40/install/deploy_without_hadoop.html |