Current state: Accepted
Released: 1.1.0 (WIP)
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Ensuring that the Kafka Controller is healthy is an important part of monitoring the health of a Kafka Cluster. This is a followup KIP of KIP-143 to add more Kafka Controller metrics that can be useful for monitoring controller health.
All of the following will be added via the Yammer metrics library like most of the broker metrics.
value: size of the ControllerEventManager's queue.
value: time it takes for any event (except the Idle event) to wait in the ControllerEventManager's queue before being processed
(3) kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs, brokerId=someId
value: the rate (requests per second) at which the ControllerChannelManager takes requests from the queue of the given broker. And the time it takes for a request to stay in this queue before the it is taken from the queue.
We will add the relevant metrics as specified in the Public Interfaces section.
Compatibility, Deprecation, and Migration Plan
We are introducing new metrics so there is no compatibility impact.
- Use Kafka metrics instead of Yammer metrics: most of the broker metrics use Yammer Metrics so it makes sense to stick with that until we have a plan on how to migrate them all to Kafka Metrics.