Current state"Under Discussion"

Discussion thread: N/A


Released: 4.6.0

This BP only focus on removing direct metadata storage access from bookkeeper client, to make bookkeeper client a thin client. This BP does not introduce any distributed metadata store or eliminate zookeeper.

There will be a subsequent BP discussing eliminating zookeeper.


BookKeeper uses zookeeper for service discovery (discovering the available bookies in the cluster), metadata management (storing all the metadata for ledgers). However it exposes the metadata storage directly to the clients, making bookkeeper client a very thick client. This also exposes some problems:

  • The client talks to metadata storage directly, which it has to handle metadata upgrade and make client very complicated.
  • The client has to deal with zookeeper's connection state, e.g. session expires, which should ideally be managing at server side.
  • The number of clients might be potentially larger than the number of bookies in the cluster. If every client talks to zookeeper directly, when client connection expires, it would potentially cause reconnection storm which would put zookeeper into a bad state.


This BP explores the possibility of eliminating zookeeper completely from client side, to produce a thin bookkeeper client.

Public Interfaces

No public interface changes.

Proposed Changes

ZooKeeper is used for two parts, one is bookie service discovery, while the other one is metadata storage. The metadata storage is well abstracted in a LedgerManager interface, while the bookie service discovery isn't. So this BP will cover following two parts:

Metadata Management

  • Add an RPC service on Bookie
    • The RPC service wraps a configured LedgerManagerFactory. when it receives any metadata requests, it delegates the metadata requests to the actual LedgerManagerFactory.
    • The RPC service hides the details of the metadata storage implementation.
  • The RPC service will be implemented in gRPC.
    • Using gPRC
      • It is protobuf and netty based, no additional dependencies are introduced.
      • gRPC is good for rpc calls, which is perfectly suitable for metadata operations.
      • gRPC hides the details of connection management, which make implementation much simpler
  • In the client side, implements a LedgerManagerFactory uses the RPC service.


The proposed metadata rpc is:


enum StatusCode {
    SUCCESS                     = 0;
    // 4xx: client errors
    BAD_REQUEST                 = 400;
    // 5xx: server errors
    INTERNAL_SERVER_ERROR       = 500;
    NOT_IMPLEMENTED             = 501;
    // 6xx: unexpected
    UNEXPECTED                  = 600;
    // 7xx: ledger metadata
    LEDGER_METADATA_ERROR       = 700;
    LEDGER_EXISTS               = 701;
    LEDGER_NOT_FOUND            = 702;
    // 9xx: revisions, versions
    BAD_VERSION                 = 900;
message LedgerMetadataRequest {
    // if ledger id is omitted, a new ledger id will be generated and returned.
    int64 ledger_id                     = 1;
    // ledger metadata
    LedgerMetadataFormat metadata       = 2;
    // expected version
    int64 expected_version              = 3;
message LedgerMetadataResponse {
    // status code
    common.StatusCode code              = 1;
    // ledger id
    int64 ledger_id                     = 2;
    // the version of the metadata (we can add another field if we want to support non-integer version)
    int64 version                       = 3;
    // ledger metadata
    LedgerMetadataFormat metadata       = 4;
message GetLedgerRangesRequest {
    // limit is a limit on the number of ledgers returned per response
    int32 limit_per_response = 1;
message GetLedgerRangesResponse {
    // status code
    common.StatusCode code              = 1;
    // count is set to the number of keys returned
    int32 count                         = 2;
    bytes serialized_ranges             = 3;
service LedgerMetadataService {
    rpc Create(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Remove(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Read(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Write(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Watch(stream LedgerMetadataRequest) returns (stream LedgerMetadataResponse);
    rpc Iterate(GetLedgerRangesRequest) returns (stream GetLedgerRangesResponse);

Service Discovery

  • Provide a ServiceDiscovery interface for discovering bookies.
  • ZooKeeper based service disovery will be one of the implementation of this interface.
  • Same as the approach in Metadata Management. We will add RPC calls for service discovery in bookies.
  • RPC calls delegate all service discovery related requests.
  • In a client side, implements a RPC service based Service Discovery.


It is worth pointing out - this propsoal WILL NOT change any existing workflow. It will be a modular RPC implementation added to the bookie service. If applications are willing to use zookeeper based solution, they can still use existing solution without changing anything.

Compatibility, Deprecation, and Migration Plan

This change doesn't bring in any incompatible changes. So no special instructions for migration.

  • Upgrade Bookies to enable metadata rpc services
  • Upgrade Clients to use LedgerManagerFactory implementation based on metadata rpc service.

Rollback is also pretty straightfoward.

  • Configure clients to use the original LedgerManagerFactory.

Test Plan

The test plan will cover:

  • normal unit tests
  • end-to-end integration tests
  • backward (upgrade) tests
  • performance/load tests

Rejected Alternatives


  • No labels


  1. Jia Zhai glad to see you working on this topic. I had already started thinking about it.

    Are you going to start an email thread for discussion ?

    I have some ideas to share, maybe the mailing list is the best place

  2. While I appreciate the fact that we are thinking of ZK and issues associated with it, I guess I don't understand the full scope of this proposal.

    As the description says - ZK is used for service discovery and also as metadata store.

    I believe this proposal still needs clients to depend on ZK for the service discovery.

    So #of client connections to ZK is not going to change much, they all still need to have connections to ZK.

    Clients create, update and query/get the metadata. Routing it through bookies bring in lot more complexity

    to the protocol. Which bookie it needs to be contacted? What if that bookie is not reachable? How does the client/writer goes through ensemble change? Going through bookies introduces another level of indirection and would open up many race conditions we need to deal with. One more hop introduces latency issues. Also, this proposal is not going to make client thin, because client (writer) is the one who is still going to make decisions on updating the metadata, isn't it? Going away from think client to thin client and letting "server" deal with metadata operations need a total reboot of the stack.

    Instead what I would like to focus is to have a separate metadata server instead of ZK. ZK doesn't scale for bigger clusers, and as we make metadata richer, it incrases the foot print, and makes the entire cluster slow. We already have InMemoryMetaStore, and I want to expand on that idea, or completely move away from ZK and find something else which can provide serve both functions well.

  3. Thanks for your comments JV. We are coming up with a BP covering the overall picture.