Status
Current state: Accepted
Discussion thread:
JIRA:
-
KAFKA-3473Getting issue details...
STATUS
Released: 1.1.0 (WIP)
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Ensuring that the Kafka Controller is healthy is an important part of monitoring the health of a Kafka Cluster. This is a followup KIP of KIP-143 to add more Kafka Controller metrics that can be useful for monitoring controller health.
Public Interfaces
All of the following will be added via the Yammer metrics library like most of the broker metrics.
(1) kafka.controller:type=ControllerEventManager,name=EventQueueSize
type: gauge
value: size of the ControllerEventManager's queue.
(2) kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs
type: histogram
value: time it takes for any event (except the Idle event) to wait in the ControllerEventManager's queue before being processed
(3) kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs, brokerId=someId
type: timer
value: the rate (requests per second) at which the ControllerChannelManager takes requests from the queue of the given broker. And the time it takes for a request to stay in this queue before the it is taken from the queue.
Proposed Changes
We will add the relevant metrics as specified in the Public Interfaces section.
Compatibility, Deprecation, and Migration Plan
We are introducing new metrics so there is no compatibility impact.
Rejected Alternatives
- Use Kafka metrics instead of Yammer metrics: most of the broker metrics use Yammer Metrics so it makes sense to stick with that until we have a plan on how to migrate them all to Kafka Metrics.