DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
...
| Section | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||
We add a new quorum state Note: This adds a new invariant that only Prospective state can transition to Candidate state.
|
New New ProspectiveState
A follower will now transition to Prospective instead of Candidate when its fetch timeout expires. Servers will only be able to transition to Candidate state from the Prospective state.
...
- nothing changes and the replica is unable to receive enough vote responses from the quorum before randomElectionTimeoutMs, the replica won't increase its epoch.
- PreVote is rejected, the replica won't increase its epoch and will transition to Unattached or Follower in attempt to reach leader.
- PreVote is granted (which indicates replica is replica is able to communicate with at least majority of quorum) and replica transitions to Candidate with disruptive epoch bump. We cannot assume the new election will be granted, but we had a good indication that the replica had a chance for being able to communicate with at least majority of the quorum) and replica transitions to Candidate with disruptive epoch bump. We cannot assume the new election will be granted, but we had a good indication that the replica had a chance for being able to communicate with majority of the quorum, and that the majority would grant the vote.
For the scenario of receiving majority rejected votes, it also makes sense for Candidate state to have a backoff or to wait the remainder of the random election timeout (as suggested by the Raft paper). However, we arguably do not need an exponentially increasing backoff. Candidate will transition to Prospective on loss of the election, which provides a buffer against another disruptive epoch increase. Keeping the exponential backoff behavior adds bloat to Prospective state and unneeded complexity (e.g. tracking the number of times a replica has transitioned back and forth between Candidate and Prospective state, exponential calculation is hard to read). However, we will take changing the backoff behavior in this scenario as out-of-scope as it is not immediately obvious what would be a better alternative (e.g. smaller uniformly random election backoff which means deprecating max election timeout ms, or finish waiting rest of the random election timeout which means potentially longer unavailability of quorum)
FollowerState changes
- , and that the majority would grant the vote.
For the scenario of receiving majority rejected votes, it also makes sense for Candidate state to have a backoff or to wait the remainder of the random election timeout (as suggested by the Raft paper). However, we arguably do not need an exponentially increasing backoff. Candidate will transition to Prospective on loss of the election, which provides a buffer against another disruptive epoch increase. Keeping the exponential backoff behavior adds bloat to Prospective state and unneeded complexity (e.g. tracking the number of times a replica has transitioned back and forth between Candidate and Prospective state, exponential calculation is hard to read). However, we will take changing the backoff behavior in this scenario as out-of-scope as it is not immediately obvious what would be a better alternative (e.g. smaller uniformly random election backoff which means deprecating max election timeout ms, or finish waiting rest of the random election timeout which means potentially longer unavailability of quorum)
FollowerState changes
Followers now track votedKey. This change is not a needed feature of the KIP, but we should not drop persisted state during quorumstate transitions in the same epoch. (In the past, we would lose this information on transitions from Unattached with votedKey to Follower in the same epoch). Now, it is also possible that the transition from Prospective with votedKey to Follower in the same epoch occurs.
ResignedState changes
Resigned voters used to transition directly to Candidate after waiting an election timeout (observers would transition to UnattachedState with epoch + 1). If we simply replace the transitionToCandidate with transitionToProspective, a cordoned leader in epoch 5 could resign in epoch 5, transition to prospective in epoch 5 (with leaderId=localId), fail election and then attempt to become follower of itself in epoch 5. To address this, when Resigned transitions it must increase its epoch.
We can simplify the transition further to have Resigned always transition to Unattached with epoch + 1 after the election timeout (no matter if it is a voter or observer), and have transitionToUnattached initialize the new electionTimeoutMs to the resignedState's remainingElectionTimeoutMs if it is a voter. This effectively causes Resigned voters to transition immediately to Prospective after an election timeout.
(For more discussion about alternatives and why this option was chosen, see https://github.com/apache/kafka/pull/18240#discussion_r1899341945)Followers now track votedKey. This change is not a needed feature of the KIP, but we should not drop persisted state during quorumstate transitions in the same epoch. (In the past, we would lose this information on transitions from Unattached with votedKey to Follower in the same epoch). Now, it is also possible that the transition from Prospective with votedKey to Follower in the same epoch occurs.
Observers
Similar to how Observers cannot transition to Candidate, they can not transition to Prospective.
...