Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

New server side segment completion steps

  1. The commit server sends uploads the segment metadata to the controller with segment uri set to be the configured deep storage location (if any) or empty string if not.
    1. Note: While the segment uri is set, the segment upload in step (2) has not occurred yet. Here we use a technique of Presumed Success: it is based on the experiences that the deep storage is up most of the time and thus the following upload step (2) will very likely to succeed. Even the upload is not successful, there are backup download mechanisms (i.e., download from peer server). To make metadata commit before segment upload has the advantage that if metadata commit failed, there will be no segment data in deep storage.  
  2. If step (1) commit to controller is successful, the server performs best-effort upload of the segment to the deep storage. In particular, the server does not need to wait for the segment load to succeed before proceeding.
  3. to a configurable deep store location* and waits for a timeout period for the upload to finish
    1. If the upload finishes successfully, a segment location URI I will be returned by the uploader.
    2. If the upload fails or times out, a NULL location string will be returned. 
    3. *If no deep store is configured,  NULL will be returned like b) above.
  4. The commit server proceeds to perform segment metadata commit step of the split commit protocol.
    1. The segmentLocation used in the metadata commit is either I (when segment upload succeeds) and a special indicator when upload fails. 
  5. If the metadata commit failsOtherwise, the present commit failure handling kicks in.

...

  1. A new api for segment download from a Pinot server (via server Admin api port)
    • URI path:  /tables/{tableName}/segments/{segmentName}
    • Usage: Download a realtime or offline table segment as a zipped tar file.
    • Code location:  TablesResource

Config change

  1. Enable best effort segment upload in SplitSegmentCommiter and download segment from peer servers.

Option 1New table config:  Add a new optional string field peerSegmentDownloadScheme to the SegmentsValidationAndRetentionConfig in the TableConfig. The value can be http or https

During segment completion phase,

  • SplitSegmentCommitter  will check this value. If it exists, the segment committer will be able to finish segment commit successfully even if the upload to the segment store fails. The committer will report to the controller that the segment is available in peer servers.
  • When Pinot servers in LLRealtimeSegmentDataManager fail to download segments from the segment store during goOnlineFromConsuming() transition, they also check this field's value. If it exists, it

...

  • can init a PeerServerSegmentFetcher to
    1. First discover the segment location server URI u.
    2. Construct the complete uri using the configured scheme (http or https) and use the appropriate segment fetcher to download it.

Note this is a table level config. We will test the new download behavior in realtime tables in incremental fashion. Once fully proven, this config can be upgraded to server level config. Option 2

Notes and Caveats:

  • Add a new optional boolean field enablePeerSegmentDownload to the SegmentsValidationAndRetentionConfig in the TableConfig
  • Add a new file scheme called SERVER, when SegmentFetcherFactory sees this scheme in the server instance file system config, it will initialize a new kind of segment fetcher PeerServerSegmentFetcher which givens a segment name, handles both server discovery and segment download. The PeerServerSegmentFetcher can be further configured using http or https fetcher for fetching segment from peers.

       During segment completion phase,

  1. The new table config also allows server to download offline segments from peers. But one can not simply use peer download of offline segment without check like realtime segments because an offline segment can be refreshed or changed. A race condition can happen for offline segments which can be dangerous. If a segment has been updated with a newer version, and server A and B have old versions. Both of them get notified of the newer version. They may try to fetch the segment and fail, and eventually fetch from each other, and end up thinking that they have the newest version of the segment. (Example given by Subbu Subramaniam)

    While there are some ways to fix this (e.g. in the segment update message, send the crc of the new version), we need to vet these well before adopting these

  2. SplitSegmentCommitter  will check this value and behaves exactly like Option 1.
  3. When Pinot servers in LLRealtimeSegmentDataManager fails to download segments from the segment store during goOnlineFromConsuming() transition, it can use the  PeerServerSegmentFetcher (if SERVER scheme is configured) to discover the download the segment from peers

    .

Failure cases and handling

...