Current state: Accepted
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
A broker may be alive but unable to establish new connections, emit metrics, or emit logs due to failed DNS resolution. It would be helpful to have a metric that counts the number of brokers registered to a cluster since it is not altogether obvious that a particular broker is not emitting metrics or logs. It would also be helpful to count the number of unfenced brokers so that it is known how many brokers are present in the metadata response and how many are not.
We propose adding two new controller metrics. Note that these metrics are 0 on all nodes except the active controller.
|Attribute Name||Value when using ZK||Value When using KRaft|
|kafka.controller:type=KafkaController:ActiveBrokerCount||The number of brokers known to the KafkaController||The number of registered and unfenced brokers|
|kafka.controller:type=KafkaController:FencedBrokerCount||Always 0||The number of registered but fenced brokers|
Compatibility, Deprecation, and Migration Plan
No migration plan is needed because these metrics are new.
We discussed the possibility of having a metric which would count all brokers, fenced and unfenced. However, in the ZK world, unlike in the KRaft world, this would be equivalent to counting all active brokers. This might lead people to start using the metric that way, which could complicate their transition from ZK to KRaft. Therefore, it's better to have a metric for active brokers specifically, rather than a metric for all registered brokers.