Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following table shows the content of remote log metadata when the leadership switches from broker A to broker B.  In the table, CO stands for cleaner-offset, LE stands for leader-epoch (broker A has LE-0 while broker B has LE-1), the cleaner-offset-map content of {LE-0 -> CO-100, LE-1 -> CO-155} means cleaner-offset for leader-epoch-0 ends at offset-100, cleaner-offset for leader-epoch-1 ends at offset-155.


Event

Broker A

(Old Leader)

Broker B

(New Leader)

RL

Metadata

Note

A finishes one compaction

Initial cleaner offset map: {LE-0:}

Publish Seg-0, CO at 100


Seg-0: {LE-0->CO-100}


A becomes unresponsive and B becomes leader


Resolve cleaner-offset map as:

{LE-0 -> CO-100,

LE-1 ->}



B finishes one compaction


Upload seg-2 and CO at 155 at the moment

Seg-2:

{LE-0->CO-100,

LE-1->CO-155}


A becomes live and upload its pending compaction

Upload seg-1 and CO at 123


Seg-1:

{LE-0->CO-123}


Seg-1 & Seg-2 can arrive in any order. Regardless of arrival order, Seg-2’s cleaner offset map (LE-0-> 100, LE-1-> 155) establishes that LE-0 cleaned up to 100, and LE-1 cleaned up to 155. This means LE-0’s valid range ended at offset 100, invalidating Seg-1’s claim that LE-0 reached offset 123. The cache validation rejects Seg-1 regardless of arrival order.

Future reads from RemoteStorage



Will not read

Seg-2:

{LE-0->CO-100,

LE-1->CO-155}



Similar Fencing protection also needs to happen to the remote log segment upload from the old broker as well as the deletion of the old remote log segment.  For remote log segment deletion, we will need to perform a tombstone operation instead of outright file deletion since that deletion request can come from a zombie broker A (while active broker B still needs to read the data from that log segment).  So the metadata manager will mark the remote log segment as to be deleted (RemoteLogState.DELETE_SEGMENT_STARTED), then validates them using isRemoteSegmentWithinLeaderEpochs() before actual deletion. This prevents zombie brokers from deleting segments still valid under the current leader. 

...