DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
...
The following table shows the content of remote log metadata when the leadership switches from broker A to broker B. In the table, CO stands for cleaner-offset, LE stands for leader-epoch (broker A has LE-0 while broker B has LE-1), the cleaner-offset-map content of {LE-0 -> CO-100, LE-1 -> CO-155} means cleaner-offset for leader-epoch-0 ends at offset-100, cleaner-offset for leader-epoch-1 ends at offset-155.
Event | Broker A (Old Leader) | Broker B (New Leader) | RL Metadata | Note |
A finishes one compaction | Initial cleaner offset map: {LE-0:} Publish Seg-0, CO at 100 | Seg-0: {LE-0->CO-100} | ||
A becomes unresponsive and B becomes leader | Resolve cleaner-offset map as: {LE-0 -> CO-100, LE-1 ->} | |||
B finishes one compaction | Upload seg-2 and CO at 155 at the moment | Seg-2: {LE-0->CO-100, LE-1->CO-155} | ||
A becomes live and upload its pending compaction | Upload seg-1 and CO at 123 | Seg-1: {LE-0->CO-123} | Seg-1 & Seg-2 can arrive in any order. Regardless of arrival order, Seg-2’s cleaner offset map (LE-0-> 100, LE-1-> 155) establishes that LE-0 cleaned up to 100, and LE-1 cleaned up to 155. This means LE-0’s valid range ended at offset 100, invalidating Seg-1’s claim that LE-0 reached offset 123. The cache validation rejects Seg-1 regardless of arrival order. | |
Future reads from RemoteStorage | Will not read Seg-2: {LE-0->CO-100, LE-1->CO-155} |
Similar Fencing protection also needs to happen to the remote log segment upload from the old broker as well as the deletion of the old remote log segment. For remote log segment deletion, we will need to perform a tombstone operation instead of outright file deletion since that deletion request can come from a zombie broker A (while active broker B still needs to read the data from that log segment). So the metadata manager will mark the remote log segment as to be deleted (RemoteLogState.DELETE_SEGMENT_STARTED), then validates them using isRemoteSegmentWithinLeaderEpochs() before actual deletion. This prevents zombie brokers from deleting segments still valid under the current leader.
...