Getting Started with Whirr

Whirr CLI

Pre-requisites

You need to install Java 6 on your machine. Also, you need to have an account with a cloud provider, such as Amazon EC2.

Install Whirr

Download or build Whirr. Call the directory which contains the Whirr JAR files WHIRR_HOME (you might like to define this environment variable).

You can test that Whirr is working by running:

% java -jar $WHIRR_HOME/whirr-cli-0.1.0-SNAPSHOT.jar

It is handy to create an alias for whirr, and for one including cloud credentials:

% alias whirr='java -jar $WHIRR_HOME/whirr-cli-0.1.0-SNAPSHOT.jar'
% alias whirr-ec2='whirr --identity=$AWS_ACCESS_KEY_ID --credential=$AWS_SECRET_ACCESS_KEY'

Launch a cluster

The following will launch a Hadoop cluster with a single machine for the namenode and jobtracker, and a further machine for a datanode and tasktracker.

% whirr-ec2 launch-cluster --service-name=hadoop --cluster-name=tomhadoopcluster \
  --instance-templates='1 nn+jt 1 dn+tt'

Once the cluster has launched you can browse it by connecting to http://master-host:50030.

Login to the remote master node

Log in the master node to run hadoop code with hbase data. Then, you can flexibly execute your Hadoop codes integrated with HBase

User name is your local login, eg, jongwook as a user name:

jongwook@localhost:~/whirr$ ssh -i /home/jongwook/.ssh/id_rsa jongwook@ec2-75-xx-xx-xx.compute-1.amazonaws.com

Setup path and CLASSPATH to run hbase and hadoop codes

export HADOOP_HOME=/usr/local/hadoop-0.20.2
export HBASE_HOME=/usr/local/hbase-0.89.20100924
export PATH=$HADOOP_HOME/bin:$HBASE_HOME/bin:$PATH

# CLASSPATH for HADOOP
export CLASSPATH=$HADOOP_HOME/hadoop-0.20.2-core.jar:$HADOOP_HOME/hadoop-0.20.2-ant.jar:$CLASSPATH
export CLASSPATH=$HADOOP_HOME/hadoop-0.20.2-examples.jar:$HADOOP_HOME/hadoop-0.20.2-test.jar:$CLASSPATH
export CLASSPATH=$HADOOP_HOME/hadoop-0.20.2-tools.jar:$CLASSPATH
#export CLASSPATH=$HADOOP_HOME/commons-logging-1.0.4.jar:$HADOOP_HOME/commons-logging-api-1.0.4.jar:$CLASSPATH

# CLASSPATH for HBASE
export CLASSPATH=$HBASE_HOME/hbase-0.89.20100924.jar:$HBASE_HOME/lib/zookeeper-3.3.1.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/commons-logging-1.1.1.jar:$HBASE_HOME/lib/avro-1.3.2.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/log4j-1.2.15.jar:$HBASE_HOME/lib/commons-cli-1.2.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/jackson-core-asl-1.5.2.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.5.2.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/jetty-6.1.24.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/jetty-util-6.1.24.jar:$HBASE_HOME/lib/hadoop-core-0.20.3-append-r964955-1240.jar:$CLASSPATH
export CLASSPATH=$HBASE_HOME/lib/hbase-0.89.20100924.jar:$HBASE_HOME/lib/hsqldb-1.8.0.10.jar:$CLASSPATH

First run Hadoop pi demo at the remote node:

[jongwook@ip-10-xx-xx-xx ~]# cd /usr/local/hadoop-0.20.2/
[jongwook@ip-10-xx-xx-xx hadoop-0.20.2]# bin/hadoop jar hadoop-0.20.2-examples.jar pi 20 1000

Second, run HBase demo:

jongwook@ip-10-xx-xx-xx:/usr/local$ cd hbase-0.89.20100924/
jongwook@ip-10-xx-xx-xx:/usr/local/hbase-0.89.20100924$ ls
bin CHANGES.txt conf docs hbase-0.89.20100924.jar hbase-webapps lib LICENSE.txt NOTICE.txt README.txt
jongwook@ip-10-xx-xx-xx:/usr/local/hbase-0.89.20100924$ bin/hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version: 0.89.20100924, r1001068, Tue Oct 5 12:12:44 PDT 2010

hbase(main):001:0> status 'simple'
5 live servers
ip-10-xx-xx-xx.ec2.internal:60020 1308520337148
requests=0, regions=1, usedHeap=158, maxHeap=1974
domU-12-31-39-0F-B5-21.compute-1.internal:60020 1308520337138
requests=0, regions=0, usedHeap=104, maxHeap=1974
domU-12-31-39-0B-90-11.compute-1.internal:60020 1308520336780
requests=0, regions=0, usedHeap=104, maxHeap=1974
domU-12-31-39-0B-C1-91.compute-1.internal:60020 1308520336747
requests=0, regions=1, usedHeap=158, maxHeap=1974
ip-10-108-250-193.ec2.internal:60020 1308520336863
requests=0, regions=0, usedHeap=102, maxHeap=1974
0 dead servers
Aggregate load: 0, regions: 2

Configuration

Whirr is configured using a properties file, and optionally using command line arguments when using the CLI. Command line arguments take precedence over properties specified in a properties file.

See Configuration Guide for more on configuration.

Destroy a cluster

When you've finished using a cluster you can terminate the instances and clean up resources with

% whirr-ec2 destroy-cluster --service-name hadoop --cluster-name tomhadoopcluster

Whirr API

Whirr provides a Java API for stopping and starting clusters. Please see the unit test source code for how to achieve this.

There's also some example code at http://github.com/hammer/whirr-demo.

Child pages

Quick Start Guide