Status
Current state: Discussion
Discussion thread:
JIRA:
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
We currently have under replicated partitions, but we do not have a metrics for unavailable partitions. Unavailable partitions could be most easily defined as “The number of partitions that this broker leads for which the ISR is insufficient to meet the minimum ISR required.” So if the RF was 3, and min ISR is 2, then when there are 2 replicas in ISR this partition would be in the under replicated partitions count. When there is 1 replica in ISR, this partition would also be in the unavailable partitions count.
Public Interfaces
Add the following two yammer metrics (and resulting JMX metrics)
- kafka.server:name=UnavailablePartitionCount,type=ReplicaManager
There is one such gauge per-broker
- kafka.cluster:name=Unavailable,type=Partition,topic={topic},partition={partition}
There is one such gauge per-partition.
Proposed Changes
- Add the yammer gauge Unavailable
to the kafka.cluster.Partition
class, similar to the existing UnderReplicated
metric.
The value of this metric is 1 if the broker is leader of this partition AND the number of in-sync replicas of this partition < min ISR of this partition. Otherwise it is 0.
- Add the yammer gauge UnavailablePartitionCount
to the kafka.server.ReplicaManager
class, similar to the existing UnderReplicatedPartitions
metric.
The value of this metric is the total number of leader partitions on this broker which are Unavailable (as defined above).
Compatibility, Deprecation, and Migration Plan
The change is fully backwards compatible.
Rejected Alternatives
None