Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Introduction

This describes how to set up Kerberos, Hadoop, Zookeeper and Giraph so that all components work with Hadoop's security features enabled.

Disclaimer

This is intended for development only: it is not intended as a best-practices guide to secure Hadoop deployment in a production setting. Actual production use will have additional and site-specific changes to enhance security.

Kerberos

Installation

Code Block
sudo yum -y install krb5-server

Configuration

Code Block
title/etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = HADOOP.LOCALDOMAIN
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 1d
 renew_lifetime = 7d
 forwardable = yes
 proxiable = yes
 udp_preference_limit = 1
 extra_addresses = 127.0.0.1
 kdc_timesync = 1
 ccache_type = 4
 allow_weak_crypto = true

[realms]
 HADOOP.LOCALDOMAIN = {
  kdc =  localhost:88
  admin_server =  localhost:749
 }

[domain_realm]
 localhost = HADOOP.LOCALDOMAIN
 .compute-1.internal = HADOOP.LOCALDOMAIN
 .internal = HADOOP.LOCALDOMAIN
 internal = HADOOP.LOCALDOMAIN

[appdefaults]
 pam = {
  debug = false
  ticket_lifetime = 36000
  renew_lifetime = 36000
  forwardable = true
  krb4_convert = false
 }

[login]
 krb4_convert = true
 krb4_get_tickets = false

Initialize Kerberos KDC service

Code Block
$ sudo kdb5_util create -s
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal' for realm 'HADOOP.LOCALDOMAIN',
master key name 'K/M@HADOOP.LOCALDOMAIN'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key: 

Re-enter KDC database master key to verify: 

$

Startup

Code Block
sudo service krb5kdc restart

Set up principals

Use this script (from https://github.com/ekoontz/kerb-setup). Run this script as a normal user who has sudo privileges: it will call sudo as needed. Choose a password that you will use for your own (ordinary user) principal, and pass this password as the first argument of the script:

Code Block
./principals.sh mypassword

This script will save the keytab files in the current working directory in a file called services.keytab. We'll assume you have this file in the directory $HOME/kerb-setup/ and will use the full path $HOME/kerb-setup/services.keytab in the Hadoop configuration files below.

Hadoop

Build

Code Block
git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout origin/branch-1.0.2

Remove dependency on java5

Open build.xml in an editor to remove package's dependency on docs, cn-docs, so that it looks like:

Code Block
  <target name="package" depends="compile, jar, javadoc, api-report, examples, tools-jar, jar-test, ant-tasks, package-librecordio"
          description="assembles multi-platform artifacts for distribution">

Run build

Code Block
ant -Dcompile.native=true clean jsvc package

This causes a working hadoop runtime to be available within the directory $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT, but
we still need to configure it to enable security-related features.

Configuration

Replace $HOST with `hostname -f` and $HOME with `echo $HOME` below.

Code Block
title$HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT/conf/core-site.xml
<configuration>
  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
  </property>
  <property>
    <name>giraph.zkList</name>
    <value>localhost:2181</value>
  </property>
</configuration>
Code Block
title$HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT/conf/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://$HOST:8020/</value>
  </property>
  <property>
    <name>dfs.namenode.keytab.file</name>
    <value>$HOME/kerb-setup/services.keytab</value>
  </property>
  <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/_HOST@HADOOP.LOCALDOMAIN</value>
  </property>
  <property>
    <name>dfs.https.enable</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.namenode.user.name</name>
    <value>hdfs</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>$HOST:8070</value>
  </property>
  <!-- NOTE: this is still needed even though https is not enabled. -->
  <property>
    <name>dfs.https.port</name>
    <value>8090</value>
  </property>
  <property>
    <name>dfs.datanode.address</name>
    <value>0.0.0.0:1004</value>
  </property>
  <property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:1006</value>
  </property>
  <property>
    <name>dfs.datanode.keytab.file</name>
    <value>$HOME/kerb-setup/services.keytab</value>
  </property>
  <property>
    <name>dfs.datanode.kerberos.principal</name>
    <value>hdfs/_HOST@HADOOP.LOCALDOMAIN</value>
  </property>
</configuration>
Code Block
title$HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT/conf/mapred-site.xml
<configuration>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>$HOST:8030</value>
  </property>
  <property>
    <name>mapred.job.tracker.http.address</name>
    <value>0.0.0.0:8040</value>
  </property>
  <property>
    <name>mapred.task.tracker.http.address</name>
    <value>0.0.0.0:8050</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.keytab.file</name>
    <value>$HOME/kerb-setup/services.keytab</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.kerberos.principal</name>
    <value>mapred/_HOST@HADOOP.LOCALDOMAIN</value>
  </property>
  <property>
    <name>mapreduce.tasktracker.keytab.file</name>
    <value>$HOME/kerb-setup/services.keytab</value>
  </property>
  <property>
    <name>mapreduce.tasktracker.kerberos.principal</name>
    <value>mapred/_HOST@HADOOP.LOCALDOMAIN</value>
  </property>
</configuration>

Add the following to your $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT/conf/hadoop-env.sh:

(look for immediately below the # Extra Java CLASSPATH elements. Optional. line).

_Note that the jars in the following HADOOP_CLASSPATH will only be present after they are fetched by Maven when you build Giraph (below). Therefore you should wait to start your Hadoop daemons _until you've build Giraph.

Code Block
title"hadoop-env.sh"
export HADOOP_CLASSPATH=$HOME/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:$HOME/.m2/repository/commons-io/commons-io/1.3.2/commons-io-1.3.2.jar:$HOME/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:$HOME/.m2/repository/org/json/json/20090211/json-20090211.jar:$HOME/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar

Giraph

Build

Code Block
git clone git://git.apache.org/giraph.git
cd giraph
mvn -DskipTests -Phadoop_1.0 clean package

Configuration

Note giraph.zkList in core.site.xml above.

Hadoop Daemon Startup

Code Block
title"hadoop-startup.sh"
cd $HOME/hadoop-common/build/hadoop-1.0.3-SNAPSHOT
rm -rf /tmp/hadoop-`whoami`
bin/hadoop namenode -format 
bin/hadoop namenode &
sleep 2
export HADOOP_SECURE_DN_USER=`whoami`
sudo -E bin/hadoop datanode &
bin/hadoop jobtracker &
sleep 2
bin/hadoop tasktracker &

Zookeeper

Build

Code Block
git clone git://git.apache.org/zookeeper.git 
cd zookeeper
ant clean jar

Configuration

Create a conf/zoo.cfg file in your zookeeper directory:

Code Block
dataDir=/tmp/zkdata
clientPort=2181

Startup

Code Block
bin/zkServer.sh start-foreground

Initialize your principal

Code Block
kinit

You'll be asked for a password; use the same password that you chose above when you ran principals.sh in the Set up principals section above.

Run your job!

Code Block
cd $HOME/hadoop/build/hadoop-1.0.3-SNAPSHOT
bin/hadoop jar ~/giraph/target/munged/giraph-0.2-SNAPSHOT.jar org.apache.giraph.\
benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 2