...
- Download the segment from the deep store uri found (if any) in the property store of the zk server.
- Note: the download is done with retries and exponential backoff time to cater for the time commit server needs for segment upload.
- If (1) failed, get the external view of the table (from Controller) to find out the ONLINE servers hosting the segments.
- Download the segment from a randomly chosen ONLINE server. The download format is Gzip'd file of the segment on the source server.
Failure cases and handling
- What happens when the The segment upload fails but the preceding metadata commit succeeds?
- In this case, if a server needs can failover to download the segment, it needs to download from the commit server which has a copy of the data.
- If users want to minimize the chances of downloading from peer servers: the segment completion mode can be set as DEFAULT instead of DOWNLOAD.
- In the background, RealtimeValidationManager will fix the upload failure by periodically asks the server to upload missing segments..
- The segment upload fails and the commit server crashes but the preceding metadata commit succeeds
- The non-committer server can not download from the committer server
- In DEFAULT segment completion mode, the non-committer server can still try to finish the segment.
- In DOWNLOAD segment completion mode, the non-committer server will get into ERROR state for the segment.
- Wait for the RealtimeValidationManager to fix the segment.
- The segment upload succeeded but the the commit server crashes
- The non-commit servers can download from the segment store.
- The segment upload succeeded but the controller crashes
- Can be handled similar to the current failure handling mechanism.
- If another server was asked to commit and upload the same segment again, let PinotFS to handle the segment overwrites.
- Another What happens if another server gets a "download" but the committer has not gotten to ONLINE state yet?
- To account for the fact that the metadata commit happens before the segment upload, another server should do retries (with exponential backoff) when downloading.
- The retries with wait can greatly reduce the issues caused by the above race condition.
...