Problem statement

When realtime segment completes, the pinot servers and  lead controller execute the Segment Completion Protocol to ensure that exactly one consistent copy of the segment is uploaded to the controller. See Consuming and Indexing rows in Realtime for details. Note that there are two ways to complete the segment – A single commit step, and a split commit.

The controller currently unzips the uploaded segment so as to update the Segment ZK Metadata with additional information (start/end time, crc, etc.)

With deep storage support, this step involves the controller loading the segment from the deep storage onto its local storage, in order to extract the segment metadata. This involves additional network from deep storage that is not necessary.

Proposal

The proposal is for the servers to include the segment metadata in the final phase of split commit protocol. See the split commit ladder diagram in Consuming and Indexing rows in Realtime

In the final COMMIT step, the server includes the segment metadata and the segment creation metadata files so that the controller need not extract these from the segment in order to complete the segment and start the next one.

The segment upload should ideally be going to the Segment store directly. That modification will be worked on independent of this one


Design

We will add a new REST endpoint for segmentCommit step (3rd step of segment completion). This will be a POST endpoint, that includes a multi-part form. The multi-part POST has the following two parts:

  1. Segment Metadata
  2. Segment Creation Metadata

The controller end point will extract these two files and use them for segment completion. The controller code is to be re-factored so that the single-step commit extracts these elements from the posted segment before calling the SegmentCompletionManager to do the final step. So, the code path for the final step will be common in either methods of segment completion.

Segment Commit with metadata

Backward compatibility

The controller will be modified to expose both of the endpoints so that older servers will be able to work with the newer controllers.

The existing controller endpoint for a single-step commit will be modified to extract the segment metadata and segment creation metadata if there are three parts in the uplload.

The servers will support a configuration mechanism so that the servers will use the new mechanism in this proposal only if configured to do so. This configuration can ensure backward compatibility if servers are upgraded before controllers.


  • No labels