Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DateAuthorComment
2021-01-06xxyu@apache.orgCreate for Kylin 4.0.0-beta.



NOTES: If your $SPARK_HOME points to $KYLIN_HOME/spark, then you can ignore this document. Kylin will help you do the jar package replacement described in this document. You don't need to do this steps to start kylin. However, kylin's automatic replacement of jar packages may fail. If you encounter problems such as ClassNotFound during use Kylin, you still need to refer to this document to manually replace jar packages.

Kylin on EMR 5.31 

Create a EMR cluster

Code Block
languagebash
themeEmacs
titleCreate Cluster
linenumberstrue

# Create a EMR cluster
$ aws emr create-cluster --applications Name=Hadoop Name=Hive Name=Pig Name=Spark Name=Sqoop Name=Tez Name=ZooKeeper \
	--release-label emr-5.31.0  \
	--ec2-attributes '{"KeyName":"XiaoxiangYu","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-XXX","EmrManagedSlaveSecurityGroup":"XXX","EmrManagedMasterSecurityGroup":"XXX"}'  \
	--log-uri 's3n://aws-logs-XXX/elasticmapreduce/xiaoxiangyu' \
	--instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":100,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"MASTER","InstanceType":"m5.xlarge","Configurations":[{"Classification":"hive-site","Properties":{"hive.optimize.sort.dynamic.partition":"false"}}],"Name":"Master Node"},{"InstanceCount":2,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":50,"VolumeType":"gp2"},"VolumesPerInstance":1}]},"InstanceGroupType":"CORE","InstanceType":"m5.xlarge","Configurations":[{"Classification":"hive-site","Properties":{"hive.optimize.sort.dynamic.partition":"false"}}],"Name":"Worker Node"}]' \
	--configurations '[{"Classification":"mapred-site","Properties":{"mapreduce.map.memory.mb":"3072","mapreduce.reduce.memory.mb":"6144","mapreduce.map.java.opts":"-Xmx2458m","mapreduce.reduce.java.opts":"-Xmx4916m"}},{"Classification":"yarn-site","Properties":{"yarn.nodemanager.resource.cpu-vcores":"4","yarn.nodemanager.resource.memory-mb":"12288","yarn.scheduler.maximum-allocation-mb":"12288","yarn.app.mapreduce.am.resource.mb":"6144"}},{"Classification":"emrfs-site","Properties":{"fs.s3.consistent":"false"}}]' --auto-scaling-role EMR_AutoScaling_DefaultRole \
	--ebs-root-volume-size 50 --service-role EMR_DefaultRole --enable-debugging --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
	--name 'OSS-Dev-Cluster' \
	--region XXX

# Login into master node
$ ssh -i ~/XXX.pem hadoop@ec2-XXX.compute.amazonaws.com.cn 

...