Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Modified segment completion protocol:

1) Persist a segment

Currently the ‘commit-leader’ makes a synchronous ‘commit’ call to the Pinot Controller to upload the segment being committed to a deep storage system. If this upload succeeds then the segment is deemed as committed and can be brought online.

Our new approach relies on the fact that "segmentLocation" is now a list of different HTTP end points. Any one location should be sufficient to download this segment (needless to say if one location is unavailable, we can retry using the next location). Exactly how this segment is persisted and when can it transition to ONLINE state can be done in 2 different waysas follows:

a) Persisting a ) No Quorumsegment to both Pinot server (synchronously) and a configured deep storage (asynchronously and optional):

  • The commit leader becomes the owner of the segment being committed. It will make this segment downloadable using a new HTTP end point. After the leader commits the segment metadata, it uploads the segment to the configured deep storage as a backup. The deep storage uri will be added to the segment location list in controller.
  • Similarly other replicas will keep updating this 'segment location' via the Controller.
  • When the controller asks a server to download the segment for whatever reason, the server will check the segment location list. It will first try to download it from a deep storage uri to minimize the impact on servers. If there is no such uri, it will go down the list of servers to download the segment.
  • All existing error handling routines remain unchanged. The following diagrams show the critical scenarios of (1) happy path, (2) controller failure and (3) slow server downloading from deep storage or Peer server. 

                          


Pros:

  • Simple to implementNo coordination needed.
  • Minimal changes to the current controller/server FSMs used in the LLC segment protocols: some noticeable changes would be in the controller FSM – the Committer Uploading state now becomes obsolete for obvious reason and should be generalized to a state which reflects the segment is being persisted.  

Cons:

  • If the only server that saves a copy of this segment is destroyed and at the same time the other replicas are unable to catch up, then we may completely lose this segment. In this case, we need to do a segment recovery (either from Kafka or from an offline source).

b) Quorum:

We can further improve the availability of this design by requiring the controller to wait for a quorum of servers to finish ingesting the current segment before updating segment metadata and transitioning on an ONLINE state. In this case, the Segment metadata (segment location) will point to a list of HTTP endpoints (for the different replicas that successfully ingested this segment).

Pros:

  • Robust in the presence of server failures

Cons:

  • Needs coordination work to be done by Pinot Controller.

...


Alternative designs:

1) No coordination between Pinot servers

We can simplify the coordination required for selecting a segment commit leader by making a small change:

...