Status

Current state: "Under Discussion"

Discussion thread: N/A

JIRA: https://github.com/apache/bookkeeper/issues/491

Released: 4.6.0

This BP only focus on removing direct metadata storage access from bookkeeper client, to make bookkeeper client a thin client. This BP does not introduce any distributed metadata store or eliminate zookeeper.

There will be a subsequent BP discussing eliminating zookeeper.

Motivation

BookKeeper uses zookeeper for service discovery (discovering the available bookies in the cluster), metadata management (storing all the metadata for ledgers). However it exposes the metadata storage directly to the clients, making bookkeeper client a very thick client. This also exposes some problems:

The client talks to metadata storage directly, which it has to handle metadata upgrade and make client very complicated.
The client has to deal with zookeeper's connection state, e.g. session expires, which should ideally be managing at server side.
The number of clients might be potentially larger than the number of bookies in the cluster. If every client talks to zookeeper directly, when client connection expires, it would potentially cause reconnection storm which would put zookeeper into a bad state.

This BP explores the possibility of eliminating zookeeper completely from client side, to produce a thin bookkeeper client.

Public Interfaces

No public interface changes.

Proposed Changes

ZooKeeper is used for two parts, one is bookie service discovery, while the other one is metadata storage. The metadata storage is well abstracted in a LedgerManager interface, while the bookie service discovery isn't. So this BP will cover following two parts:

Metadata Management

Add an RPC service on Bookie
- The RPC service wraps a configured LedgerManagerFactory. when it receives any metadata requests, it delegates the metadata requests to the actual LedgerManagerFactory.
- The RPC service hides the details of the metadata storage implementation.
The RPC service will be implemented in gRPC.
- Using gPRC
  - It is protobuf and netty based, no additional dependencies are introduced.
  - gRPC is good for rpc calls, which is perfectly suitable for metadata operations.
  - gRPC hides the details of connection management, which make implementation much simpler
In the client side, implements a LedgerManagerFactory uses the RPC service.

The proposed metadata rpc is:

enum StatusCode {
    SUCCESS                     = 0;
    // 4xx: client errors
    BAD_REQUEST                 = 400;
    // 5xx: server errors
    INTERNAL_SERVER_ERROR       = 500;
    NOT_IMPLEMENTED             = 501;
    // 6xx: unexpected
    UNEXPECTED                  = 600;
    // 7xx: ledger metadata
    LEDGER_METADATA_ERROR       = 700;
    LEDGER_EXISTS               = 701;
    LEDGER_NOT_FOUND            = 702;
    // 9xx: revisions, versions
    BAD_VERSION                 = 900;
}
 
message LedgerMetadataRequest {
    // if ledger id is omitted, a new ledger id will be generated and returned.
    int64 ledger_id                     = 1;
    // ledger metadata
    LedgerMetadataFormat metadata       = 2;
    // expected version
    int64 expected_version              = 3;
}
message LedgerMetadataResponse {
    // status code
    common.StatusCode code              = 1;
    // ledger id
    int64 ledger_id                     = 2;
    // the version of the metadata (we can add another field if we want to support non-integer version)
    int64 version                       = 3;
    // ledger metadata
    LedgerMetadataFormat metadata       = 4;
}
message GetLedgerRangesRequest {
    // limit is a limit on the number of ledgers returned per response
    int32 limit_per_response = 1;
}
message GetLedgerRangesResponse {
    // status code
    common.StatusCode code              = 1;
    // count is set to the number of keys returned
    int32 count                         = 2;
    bytes serialized_ranges             = 3;
}
 
service LedgerMetadataService {
    rpc Create(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Remove(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Read(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Write(LedgerMetadataRequest) returns (LedgerMetadataResponse);
    rpc Watch(stream LedgerMetadataRequest) returns (stream LedgerMetadataResponse);
    rpc Iterate(GetLedgerRangesRequest) returns (stream GetLedgerRangesResponse);
}

Service Discovery

Provide a ServiceDiscovery interface for discovering bookies.
ZooKeeper based service disovery will be one of the implementation of this interface.
Same as the approach in Metadata Management. We will add RPC calls for service discovery in bookies.
RPC calls delegate all service discovery related requests.
In a client side, implements a RPC service based Service Discovery.

It is worth pointing out - this propsoal WILL NOT change any existing workflow. It will be a modular RPC implementation added to the bookie service. If applications are willing to use zookeeper based solution, they can still use existing solution without changing anything.

Compatibility, Deprecation, and Migration Plan

This change doesn't bring in any incompatible changes. So no special instructions for migration.

Upgrade Bookies to enable metadata rpc services
Upgrade Clients to use LedgerManagerFactory implementation based on metadata rpc service.

Rollback is also pretty straightfoward.

Configure clients to use the original LedgerManagerFactory.

Test Plan

The test plan will cover:

normal unit tests
end-to-end integration tests
backward (upgrade) tests
performance/load tests

Rejected Alternatives

N/A

Child pages

Status

Motivation

Public Interfaces

Proposed Changes

Metadata Management

Service Discovery

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

3 Comments

Enrico Olivelli

JV Jujjuri

Jia Zhai

Child pages

BP-16: Thin Client - Remove direct metadata storage access from clients

Status

Motivation

Public Interfaces

Proposed Changes

Metadata Management

Service Discovery

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

3 Comments

Enrico Olivelli

JV Jujjuri

Jia Zhai