Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Server no longer upload segment data. Instead during the commit step, only segment metadata is upload to the controller (to be written into zk).
  2. Server can submit an asynchronous task to upload the segment build to a configured deep storage.
  3. When a slow server is asked to download a segment, it will go through the segmentLocation list for the segment. It will first try to download from the deep storage. If not available, it will then download from the servers. 
  4. To facilitate download from servers, Pinot servers will have a new REST api for segment download. 

Image RemovedImage Added                           

 


Pros:

  • Simple to implement
  • Minimal changes to the current controller/server FSMs used in the LLC segment protocols: some noticeable changes would be in the controller FSM – the Committer Uploading state now becomes obsolete for obvious reason and should be generalized to a state which reflects the segment is being persisted.  

...

Implementation Tasks (listed in rough phase order)

  1. Interface and API change
    1. Segment location is changed from a single URI string to a string containing a list of URI.
    2. Server adds a new API to allow for segment download. 
  2. Server changes
    1. Segment completion protocol
    change
    1. Server directly uploads : during segment commit, the commit server asynchronously and optionally uploads built segments to a configured HDFS store.
  3. Server FSM change
  4. Controller FSM change
    1. external store. It will wait a short amount of time for the upload to finish. Regardless of the upload result, it will move on to send the segment metadata (with the segment uri location if available) to the controller.
    2. Consuming to online Helix state transition: refactor the server download procedure so that it (1) first tries to download segments from the configured segment store if available; otherwise (2) discovers the servers hosting the segment via external view of the table and then download from the hosting server.
  5. Controller changes
    1. Segment completion protocol: controller skips uploading segments.
  6. RealtimeValidationManager
    1. During segment validation, for any segment without external storage uri, ask one server to upload it to the store and update the segment store uri in Helix.

Production (Seamless) Transition

  1. Let servers to download the segment store directly from external storage instead from controller. 
    1. This requires Pinot servers to send external storage segment uri to the controller.
  2. Enable splitCommit in both controller and server. 
  3. In LLRealtimeSegmentDataManager.doSplitCommit(),
    1. server skips uploading the segment file (based on a config flag)
    2. segment commitStart() and commitEnd() remain the same – we could do a further optimization here to combine the two calls but leave them as the current status for now. 
    3. With the above changes, this controller will not upload segment anymore because LLCSegmentCompletionHandlers.segmentUpload() is not called any more. Interestingly, skipping this call does not need any change on the controller side because Pinot controller does not change its internal state during segmentUpload().
    RealtimeValidationManager modification

Appendix 

Alternative designs:

...