Tech Note 4

ZooKeeper makes a very bad Queue source.

The ZooKeeper recipes page lists Queues as a possible use-case for ZooKeeper. Curator includes several Queue recipes. In our experience, however, it is a bad idea to use ZooKeeper as a Queue:

  • ZooKeeper has a 1MB transport limitation. In practice this means that ZNodes must be relatively small. Typically, queues can contain many thousands of messages.
  • ZooKeeper can slow down considerably on startup if there are many large ZNodes. This will be common if you are using ZooKeeper for queues. You will need to significantly increase initLimit and syncLimit.
  • If a ZNode gets too big it can be extremely difficult to clean. getChildren() will fail on the node. At Netflix we had to create a special-purpose program that had a huge value for jute.maxbuffer in order to get the nodes and delete them.
  • ZooKeeper can start to perform badly if there are many nodes with thousands of children.
  • The ZooKeeper database is kept entirely in memory. So, you can never have more messages than can fit in memory.
  • No labels

3 Comments

  1. This problem is given some loose coverage, and there is a QueueShader to break the queue into smaller pieces- http://curator.apache.org/utilities.html

  2. Anonymous

    It will be great to quantify the remarks, so there is some indication about the magnitude of the issue. To pick one example: when you say "slow down considerably on startup if there are many large ZNodes", what slowdown it this about, what quantity represents "many", what "large" means, and on what configuration?

    1. I don't have the data. This is anecdotal. At Netflix, when we used ZK for Queues, it would take 10-15 minutes to restart ZK instances. In particular, this happened when devs abused the queues. But, once in that situation, it was very hard to recover without deleting the ZK database. As this was a global, multi-tenant resource that was very disruptive.