DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Introduction
This describes how to set up Kerberos, Hadoop, Zookeeper and Giraph so that all components work with Hadoop's security features enabled.
Disclaimer
This is intended for development only: it is not intended as a best-practices guide to secure Hadoop deployment in a production setting. Actual production use will have additional and site-specific changes to enhance security.
Kerberos
Installation
sudo yum -y install krb5-server
Configuration
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = HADOOP.LOCALDOMAIN
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 1d
renew_lifetime = 7d
forwardable = yes
proxiable = yes
udp_preference_limit = 1
extra_addresses = 127.0.0.1
kdc_timesync = 1
ccache_type = 4
allow_weak_crypto = true
[realms]
HADOOP.LOCALDOMAIN = {
kdc = localhost:88
admin_server = localhost:749
}
[domain_realm]
localhost = HADOOP.LOCALDOMAIN
.compute-1.internal = HADOOP.LOCALDOMAIN
.internal = HADOOP.LOCALDOMAIN
internal = HADOOP.LOCALDOMAIN
[appdefaults]
pam = {
debug = false
ticket_lifetime = 36000
renew_lifetime = 36000
forwardable = true
krb4_convert = false
}
[login]
krb4_convert = true
krb4_get_tickets = false
Initialize Kerberos KDC service
$ sudo kdb5_util create -s Loading random data Initializing database '/var/kerberos/krb5kdc/principal' for realm 'HADOOP.LOCALDOMAIN', master key name 'K/M@HADOOP.LOCALDOMAIN' You will be prompted for the database Master Password. It is important that you NOT FORGET this password. Enter KDC database master key: Re-enter KDC database master key to verify: $
Startup
sudo service krb5kdc restart
Set up principals
Use this script (from https://github.com/ekoontz/kerb-setup). Run this script as a normal user who has sudo privileges: it will call sudo as needed. Choose a password that you will use for your own (ordinary user) principal, and pass this password as the first argument of the script:
./principals.sh mypassword
This script will save the keytab files in the current working directory in a file called services.keytab. We'll assume you have this file in the directory $HOME/kerb-setup/ and will use the full path $HOME/kerb-setup/services.keytab in the Hadoop configuration files below.
Hadoop
Build
git clone git://git.apache.org/hadoop-common.git cd hadoop-common git checkout origin/branch-1.0.2
Remove dependency on java5
Open build.xml in an editor to remove package's dependency on docs, cn-docs, so that it looks like:
<target name="package" depends="compile, jar, javadoc, api-report, examples, tools-jar, jar-test, ant-tasks, package-librecordio"
description="assembles multi-platform artifacts for distribution">
Run build
ant -Dcompile.native=true clean jsvc package
This causes a working hadoop runtime to be available within the directory $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT, but
we still need to configure it to enable security-related features.
Configuration
Replace $HOST with `hostname -f` and $HOME with `echo $HOME` below.
<configuration>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>giraph.zkList</name>
<value>localhost:2181</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://$HOST:8020/</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>$HOME/kerb-setup/services.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
<property>
<name>dfs.https.enable</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.user.name</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.http.address</name>
<value>$HOST:8070</value>
</property>
<!-- NOTE: this is still needed even though https is not enabled. -->
<property>
<name>dfs.https.port</name>
<value>8090</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>$HOME/kerb-setup/services.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
</configuration>
<configuration>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>10</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>$HOST:8030</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:8040</value>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:8050</value>
</property>
<property>
<name>mapreduce.jobtracker.keytab.file</name>
<value>$HOME/kerb-setup/services.keytab</value>
</property>
<property>
<name>mapreduce.jobtracker.kerberos.principal</name>
<value>mapred/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
<property>
<name>mapreduce.tasktracker.keytab.file</name>
<value>$HOME/kerb-setup/services.keytab</value>
</property>
<property>
<name>mapreduce.tasktracker.kerberos.principal</name>
<value>mapred/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
</configuration>
Add the following to your $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT/conf/hadoop-env.sh:
(look for immediately below the # Extra Java CLASSPATH elements. Optional. line).
_Note that the jars in the following HADOOP_CLASSPATH will only be present after they are fetched by Maven when you build Giraph (below). Therefore you should wait to start your Hadoop daemons _until you've build Giraph.
export HADOOP_CLASSPATH=$HOME/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:$HOME/.m2/repository/commons-io/commons-io/1.3.2/commons-io-1.3.2.jar:$HOME/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:$HOME/.m2/repository/org/json/json/20090211/json-20090211.jar:$HOME/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar
Giraph
Build
git clone git://git.apache.org/giraph.git cd giraph mvn -DskipTests -Phadoop_1.0 clean package
Configuration
Note giraph.zkList in core.site.xml above.
Hadoop Daemon Startup
cd $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT rm -rf /tmp/hadoop-`whoami` bin/hadoop namenode -format bin/hadoop namenode & sleep 2 export HADOOP_SECURE_DN_USER=`whoami` sudo -E bin/hadoop datanode & bin/hadoop jobtracker & sleep 2 bin/hadoop tasktracker &
Zookeeper
Build
git clone git://git.apache.org/zookeeper.git cd zookeeper ant clean jar
Configuration
Create a conf/zoo.cfg file in your zookeeper directory:
dataDir=/tmp/zkdata clientPort=2181
Startup
bin/zkServer.sh start-foreground
Initialize your principal
kinit
You'll be asked for a password; use the same password that you chose above when you ran principals.sh in the Set up principals section above.
Run your job!
cd $HOME/hadoop/build/hadoop-1.0.3-SNAPSHOT bin/hadoop jar ~/giraph/target/munged/giraph-0.2-SNAPSHOT.jar org.apache.giraph.\ benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 2