See also How to install Hadoop distribution from Bigtop 0.6.0 for mode details on where to obtain repo files, etc.
This was done in Centos 6.5
Note for alternative file systems (i.e. HCFS implementations like S3FileSystem, the GlusterFileSystem, and so on)
- you will want to disable HDFS specific modules and set up your file system by hand.
- The other YARN related puppet modules will still work perfectly if you've set up your file system correctly. For specifics consult your file system provider and/or the bigtop mailing list.
- One simple way to do this, is set up bigtop hadoop with hdfs, make sure it runs, and then disable and yum remove hdfs and swap in your file system underneath .
Okay ! So Lets get started setting up your hadoop cluster .
0) Install all the basics in case you have a super raw machine. Most (or some of these) are probably there.
yum install -y git cmake git-core git-svn subversion checkinstall build-essential dh-make debhelper ant ant-optional autoconf automake liblzo2-dev libzip-dev sharutils libfuse-dev reprepro libtool libssl-dev asciidoc xmlto ssh curl gcc gcc-c++ make fuse protobuf-compiler autoconf automake libtool shareutils asciidoc xmlto lzo-devel zlib-devel fuse-devel openssl-devel python-devel libxml2-devel libxslt-devel cyrus-sasl-devel sqlite-devel mysql-devel openldap-devel rpm-build create-repo redhat-rpm-config wget
1) yum install puppet (you have to use version 2.7.+):
sudo rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm
yum install puppet-2.7.19
2) Now we git clone bigtop into /opt/
3) cd into /opt/bigtop/bigtop-deploy/puppet and create a file like this under site.csv
4) Make the data dirs (you can have lesser number of data difrectory as far as it is aligned with hadoop_storage_dirs parameter above ).
5) From the /opt/bigtop/bigtop-deploy/puppet directory, run this:
[root@localhost puppet]# puppet -d --modulepath=/opt/bigtop/bigtop-deploy/puppet/modules --confdir=/opt/bigtop/bigtop-deploy/puppet/ /opt/bigtop/bigtop-deploy/puppet/manifests/site.pp
Note: If you plan to use mapreduce, you must also install hadoop-mapreduce.
6) Change the value of the yarn-site.xml yarn.nodemanager.aux.services from "mapreduce_shuffle" to "mapreduce.shuffle"
Bringing the cluster up and down:
To bring the cluster up for the first time (disclaimer: independent execution of Puppet recipes on the cluster's nodes will automatically create HDFS structures and bring-up the services if all dependencies are satisfied, e..g configs are created, packages are installed, etc. If Puppet reports errors you might need to do the manual startup):
1) As root, run
# /etc/init.d/hadoop-hdfs-namenode init (omit unless you want to start with nothing in your HDFS) # /etc/init.d/hadoop-hdfs-namenode start # /etc/init.d/hadoop-hdfs-datanode start # /usr/lib/hadoop/libexec/init-hdfs.sh (not needed after the first run) # /etc/init.d/hadoop-yarn-resourcemanager start # /etc/init.d/hadoop-yarn-proxyserver start # /etc/init.d/hadoop-yarn-nodemanager start
on the master node.
2) On each of the slave nodes, run
# /etc/init.d/hadoop-hdfs-datanode start # /etc/init.d/hadoop-yarn-nodemanager start
To bring the cluster down cleanly:
1) On each of the slave nodes, run
# /etc/init.d/hadoop-yarn-nodemanager stop # /etc/init.d/hadoop-hdfs-datanode stop
2) On the master, run
# /etc/init.d/hadoop-yarn-nodemanager stop # /etc/init.d/hadoop-yarn-proxyserver stop # /etc/init.d/hadoop-yarn-resourcemanager stop # /etc/init.d/hadoop-hdfs-datanode stop # /etc/init.d/hadoop-hdfs-namenode stop