Status

Current stateUnder Discussion

Discussion thread: here 

JIRA: KAFKA-8638 

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, the kafka preferred leader election will pick the broker_id in the topic/partition replica assignments in a priority order when the broker is in ISR. The preferred leader is the broker id in the first position of replica. There are use-cases that, even the first broker in the replica assignment is in ISR, there is a need for it to be moved to the end of ordering (lowest priority) when deciding leadership during preferred leader election.

Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred leader. When preferred leadership is run, it will pick 1 as the leader if it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, then pick 3 as the leader. There are use cases that, even 1 is in ISR, we would like it to be moved to the end of ordering (lowest priority) when deciding leadership during preferred leader election. Below is a list of use cases:


Public Interfaces

Introduce a preferred_leader_blacklist dynamic config which by default is empty.  It allows a list of broker IDs separated by commas.  E.g. below broker ID 1,10,65 are being put into the blacklist. 

/usr/lib/kafka/bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --add-config preferred_leader_blacklist=1,10,65


Since the Kafka dynamic config is already using --bootstrap-server,  it does not need to manipulate the Zookeeper directly.  The downside of this: when adding/removing one broker from the list, instead of doing with one ZK node per broker in another design /preferred_leader_blacklist/<broker_id> znode, the dynamic config needs to be updated with a new complete list. e.g. in order to remove broker 10 from the blacklist,  update preferred_leader_blacklist=1,65

The dynamic config should not trigger any leadership changes automatically for the current design.

Proposed Changes


The following is the requirements this KIP is trying to accomplish: 


The preferred leader blacklist should only be used for leadership determination when either of the two gets triggered below: 

  1. Preferred leader election is run.  
  2. When a broker fails, the controller transfer all of this broker’s leaderships to the next one in priority. When determining the priority of the leader, should look up the preferred leader blacklist of the brokers and move it to the lowest. 
  3. When auto.leader.rebalance.enable is enabled.  The broker(s) in the preferred leader "blacklist" should be excluded from being elected leaders. 


The current design also does not automatically put a broker in the preferred leader blacklist. E.g. when the controller starts up itself or got controller failover, it will put itself to the blacklist.  This may be an enhancement later.

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives