Apache Kylin : Analytical Data Warehouse for Big Data
Page History
As we all know, Kylin needs to build cube before query. So if the build job and query job are both running in one cluster, service may be unstable because of the resource preemption.
Now, Kylin 4.0 supports to finish building and query tasks on different Hadoop clusters which we call build cluster and query cluster. There will be many write operations in the build cluster and read-only operation in query cluster. The build task will be sent to build cluster. When the build task finished, the data will be sent to the query cluster so that we can execute the query tasks.
With a read/write separation deployment, we can completely isolate both build and query workloads.
Architecture
Prepare
- Make sure the hadoop version(HDP or CDH) is supported by Kylin.
- Check commands like
hdfs and
hive
are all working properly and can access cluster resources. - If the two clusters have enabled the HDFS NameNode HA, please check and make sure their HDFS nameservice names are different. If they are the same, please change one of them to avoid conflict.
- Please make sure the network latency between the two clusters is low enough, as there will be a large number of data moved back and forth during model build process.
Configuration
- Install Kylin 4.0 by the following guide on Kylin server.
- Prepare the hadoop configuration files of the two cluster and put them into Kylin server.
- Open $KYLIN_HOME/conf/kylin.properties
- Set kylin.env.hadoop-conf-dir with the path of the directories of query cluster hadoop configuration files.
- Set kylin.engine.submit-hadoop-conf-dir with the path of the directories of build cluster hadoop configuration files.
- Put the hive-site.xml of the build cluster into the directory of query cluster hadoop configuration files.
Now read/write separation deployment is configured.
Node
$KYLIN_HOME/bin/check-env.sh
and$KYLIN_HOME/bin/sample.sh
are not available in this deployment mode.In this mode,
kylin.engine.spark-conf.spark.yarn.queue
inkylin.properties
should be configured as the queue of the build cluster.