Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In our new design, the segment upload to deep storage is asynchronous and done in a best effort manner. As a result, the server can not depend only on deep storage. Instead even if the deep storage is not available, we allow the server to download the segment from peer servers (although it is not preferred).  To discover the servers hosting a given segment, we utilize the external view of the table to find out the latest servers hosting the segment. It has the advantages of keeping the server list in sync with table state for events like table re-balance without extra writing/reading cost.  The complete segment download process works as follows:

  1. Download the segment from the deep store uri found (if any) in the property store of the zk server.
    1. Note: the download is done with retires retries and exponential backoff time to cater for the time commit server needs for segment upload.
  2. If (1) failed, get the external view of the table (from Controller) to find out the ONLINE servers hosting the segments.
  3. Download the segment from the server. If failed, move to the next server in the lista randomly chosen ONLINE server. The download format is Gzip'd file of the segment on the source server. 

...

  1. Interface and API change
    1. A new Pinot Server API in TablesResource.java for segment download (PR 4914). 
      1. Start with a straightforward approach to first gzip the segment directory for each request and send it back.
      2. Need performance test here on impact server query performance during segment download.
    2. A new Pinot Server API in TablesResource.java for uploading a segment to a configured deep store. (PR Pending).
      1. Used by RealtimeValidationManager to ask servers to update
  2. Server changes for segment completion (PR 4914)
    1. Add a new config parameter _enableSegmentUploadToController to IndexLoadingConfig.java (default true for backward compatibility) to control whether commit servers want need to upload segment segments to controller.
      1. when set as true, the segment completion behavior is exactly the same as the current code.
      2. when set as false, server will not upload built segment to the controller by invoking a new controller end point (see controller change below).
    2. Add a new config segment.store.root.dir to HelixInstanceDataManagerConfig.java to config the deep store root dir if necessary. 
    3. In SplitSegmentCommitter.java, if _enableSegmentUploadToController is set as false, skip synchronous segmentCommitUpload segment to controller during split commit.
    4. In LLRealtimeSegmentDataManager.java, after the segment commit is done with successful metadata upload in (c) –- by this it implies _enableSegmentUploadToController is false,
      1. If there is a configured deep store (by checking if segment.store.root.dir is set), the server does a best efforts effort upload of built segments to the configured external store – i.e., no need to retry even the upload fails.
      2. Otherwise just continue.
    5. Consuming to online Helix state transition: refactor the server download procedure so that it
      1. first tries to download segments from the configured segment store if available – this is done with retries and backoff to get rid of race condition with upload servers; otherwise
      2. Use a new PeerServerSegmentFetcher (which is configured in pinot servers via SegmentFetcherFactory)to
        discovers the servers hosting the segment via external view of the table and then download from the hosting server.
  3. Controller changes for segment completion (PR 4914)
    1. Add a new query parameter enableSegmentUpload to LLCSegmentCompletionHandlers.java's segmentCommitEndWithMetadata() to indicate if the server uploads segment to the controller during segment completion (default true to maintain backward compatibility)

      1. If the value is false, there is no need for the controller to move the segment file to its permeant location in a split commit.
      2. The changes in (ai) implies that now the controller accepts segment commit metadata with empty (but not null) deep storage uri.
    2. Related to (a), add a new param _uploadToControllerEnabled to SegmentCompletionProtocol.java's Params class.
    3. In PinotLLCRealtimeSegmentManager.java's updateCommittingSegmentZKMetadata() to setDownloadUrl for segment, use the descriptor's segment location instead of the vip address. 
      1. A related change here is to update the segment location field of the committing segment's descriptor after segment moving during a split commit.
  4. RealtimeValidationManager (PR Pending)
    1. During segment validation, for any segment without external storage uri, ask one server to upload it to the store and update the segment store uri in Helix.

...

  1. Let servers to download the segment store directly from external storage instead from controller. 
    1. This requires Pinot servers to send external storage segment uri to the controller.
  2. Enable splitCommit in both controller and server. 
  3. In LLRealtimeSegmentDataManager.doSplitCommit(),
  4. Try to download the segment from the download url in segment Zk.
  5. If (1) failed, get the external view of the table
    1. server skips uploading the segment file (based on a config flag)
    2. segment commitStart() and commitEnd() remain the same – we could do a further optimization here to combine the two calls but leave them as the current status for now. 
    3. With the above changes, this controller will not upload segment anymore because LLCSegmentCompletionHandlers.segmentUpload() is not called any more. Interestingly, skipping this call does not need any change on the controller side because Pinot controller does not change its internal state during segmentUpload().

Appendix 

Alternative designs for Segment Download

Segment Download

In Pinot low level consumer data ingestion, a server may need to download a segment instead of building by its own. Currently the server finds out the download location by reading it from the download url of the segment in zk: the download url can either be in controller (for localFS) or in deep storage.

In our new design, the segment upload to deep storage is asynchronous and best effort. As a result, the server can not depend only on deep storage. Instead we allow the server to download the segment from peer servers (although it is not preferred).  There are two options to discover the servers hosting a given segment:

  1. Store the server list in segment zk metadata for the server to read.
  2. Using the external view of the table to find out the latest servers hosting the segment (proposed by @jackie-jiang and also mentionedKishore G).

While (1) seems straightforward, (2) has the advantages of keeping the server list in sync with table state for events like table re-balance without extra writing/reading cost.  The complete segment download process works as follows:

    1. (
    from Controller
    1. )
    to find out the ONLINE servers hosting the segments.Download the segment from the server. If failed, move to the next server in the list. The download format is Gzip'd file of the segment on the source server
    1. .