Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

2) Persist the Segment: In the current design, the commit leader sends the newly created segment to the Pinot Controller which in turn uploads it to deep storage using PinotFS API.

The Persisting Segment phase at its current form requires a dependency and synchronous calls to external storage system like NFS/HDFS/S3. We argue that allowing customizable policies to store the persisted segment will make Pinot LLC easier to operate in different installation environments. Some possible persisting policies other than the current one could be:

  • Use the segment copy in the committer server as the sole backup copy.
  • Use the segment copy in the committer server as well as one more copy asynchronously saved in an external storage system.

The change to commit leader selection is optional for the purpose of deep storage decoupling. We will discuss in detail in the next section about how the above persisting options fit in the overall completion protocol. 

Modified segment completion protocol:

1) Persist a segment

Currently the ‘commit-leader’ makes a synchronous ‘commit’ call to the Pinot Controller to upload the segment being committed to a deep storage system. If this upload succeeds then the segment is deemed as committed and can be brought online.

Our new approach relies on the fact that "segmentLocation" is now a list of different HTTP end points. Any one location should be sufficient to download this segment (needless to say if one location is unavailable, we can retry using the next location). Exactly how this segment is persisted and when can it transition to ONLINE state can be done in 2 different ways:

a) No Quorum:

  • The commit leader becomes the owner of the segment being committed. It will make this segment available using a HTTP end point.
  • Commit leader will inform the controller using a similar 'COMMIT' API. This will update the segment location to this HTTP end point.
  • Similarly other replicas will keep updating this 'segment location' via the Controller.

Pros:

  • Simple to implement
  • No coordination needed.

Cons:

  • If the only server that saves a copy of this segment is destroyed and at the same time the other replicas are unable to catch up, then we may completely lose this segment. In this case, we need to do a segment recovery (either from Kafka or from an offline source).

b) Quorum:

We can further improve the availability of this design by requiring the controller to wait for a quorum of servers to finish ingesting the current segment before updating segment metadata and transitioning on an ONLINE state. In this case, the Segment metadata (segment location) will point to a list of HTTP endpoints (for the different replicas that successfully ingested this segment).

Pros:

  • Robust in the presence of server failures

Cons:

  • Needs coordination work to be done by Pinot Controller.

2) Choose a commit leader

We can simplify the coordination required for selecting a segment commit leader by making a small change:

...

Note: This modification is optional. We can keep the current approach of choosing a commit leader and persist the corresponding segment as described below.

2) Persist a segment

Currently the ‘commit-leader’ makes a synchronous ‘commit’ call to the Pinot Controller to upload the segment being committed to a deep storage system. If this upload succeeds then the segment is deemed as committed and can be brought online.

Our new approach relies on the fact that "segmentLocation" is now a list of different HTTP end points. Any one location should be sufficient to download this segment (needless to say if one location is unavailable, we can retry using the next location). Exactly how this segment is persisted and when can it transition to ONLINE state can be done in 2 different ways:

a) No Quorum:

  • The commit leader becomes the owner of the segment being committed. It will make this segment available using a HTTP end point.
  • Commit leader will inform the controller using a similar 'COMMIT' API. This will update the segment location to this HTTP end point.
  • Similarly other replicas will keep updating this 'segment location' via the Controller.

Pros:

  • Simple to implement
  • No coordination needed.

Cons:

  • If the only server that saves a copy of this segment is destroyed and at the same time the other replicas are unable to catch up, then we may completely lose this segment. In this case, we need to do a segment recovery (either from Kafka or from an offline source).

b) Quorum:

We can further improve the availability of this design by requiring the controller to wait for a quorum of servers to finish ingesting the current segment before updating segment metadata and transitioning on an ONLINE state. In this case, the Segment metadata (segment location) will point to a list of HTTP endpoints (for the different replicas that successfully ingested this segment).

Pros:

  • Robust in the presence of server failures

Cons:

.