Eurosys 2011 Tutorial
Tutorial slides
Code and library
To complete this tutorial you will need the ZooKeeper 3.3.3 "fat" jar. This has all you need to run zookeeper and develop against it.
Example skeleton code:
Running the ZooKeeper server
The easiest way to run ZooKeeper, and the way that is often used for development is to use it in "standalone" mode. This starts up a single ZooKeeper server, so it is not fault tolerant. For example:
Code Block |
---|
java -jar zookeeper-3.3.3-fatjar.jar server 2181 /tmp/zkdata |
starts a ZooKeeper server running on port 2181. It is storing its data in /tmp, so this is obviously not a way you want to run in production.
If you want to startup a cluster of server, you must do a bit more. To do this we need to create a configuration file for each server. Generally, we can share a configuration file across server, but since we are starting all three on the same machine, we need to have a configuration for each server:
Code Block | ||
---|---|---|
| ||
dataDir=/tmp/1 clientPort=2181 initLimit=3 syncLimit=3 server.1:127.0.0.1:2221:3331 server.2:127.0.0.1:2222:3332 server.3:127.0.0.1:2223:3333 |
Code Block | ||
---|---|---|
| ||
dataDir=/tmp/2 clientPort=2182 initLimit=3 syncLimit=3 server.1:127.0.0.1:2221:3331 server.2:127.0.0.1:2222:3332 server.3:127.0.0.1:2223:3333 |
Code Block | ||
---|---|---|
| ||
dataDir=/tmp/3 clientPort=2183 initLimit=3 syncLimit=3 server.1:127.0.0.1:2221:3331 server.2:127.0.0.1:2222:3332 server.3:127.0.0.1:2223:3333 |
A ZooKeeper server figures out which server it is by looking in the dataDir for a file called myid that contains its identity, so lets set those up:
Code Block |
---|
mkdir /tmp/1 echo 1 > /tmp/1/myid mkdir /tmp/2 echo 2 > /tmp/2/myid mkdir /tmp/3 echo 3 > /tmp/3/myid |
now lets startup the 3 servers:
Code Block |
---|
java -jar zookeeper-3.3.3-fatjar.jar server server1.cfg & java -jar zookeeper-3.3.3-fatjar.jar server server2.cfg & java -jar zookeeper-3.3.3-fatjar.jar server server3.cfg & |
you will see a bunch of messages which will eventually stop with something along the lines of Snapshotting: 100000000
.
Setting things up for the example application
We need to create two znodes on our server: /assign and /tasks. Conveniently, the client is also included in the "fat" jar. It even has a bit of a shell. We start up the client with:
Code Block |
---|
java -jar zookeeper-3.3.3-fatjar.jar client -server 127.0.0.1:2181 |
_if you started up a cluster of servers, you would use 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183 instead of just 127.0.0.1:2181.
Now lets create the znodes. We have to supply initial data for them even though it isn't used:
Code Block |
---|
[zk: 127.0.0.1:2181(CONNECTED) 0] create /assign "" Created /assign [zk: 127.0.0.1:2181(CONNECTED) 1] create /tasks "" Created /tasks [zk: 127.0.0.1:2181(CONNECTED) 2] ls / [assign, tasks, zookeeper] |
Extra credit (testing)
The code you have written works and is dynamic, but doesn't handle every error correctly. Check what happens when errors occur: change the TaskQueueWorker.java to use FailFirstZooKeeper.java and see what happens. (When submitting multiple tasks you will find that one gets assigned to a phantom worker.) How do you fix it?