Status
Current state: "Adopted"
Discussion thread: here
JIRA: KAFKA-4982
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently, per-network-processor-thread metrics, like socket-server-metrics.connection-count, socket-server-metrics.connection-close-rate etc are tagged with networkProcessor:<id> where the id of a network processor is just a numeric integer.
If you have more than one listener (eg PLAINTEXT, SASL_SSL, etc.), the id just keeps incrementing and when looking at the metrics it is hard to match the metric tag to a listener.
You need to know the number of network threads and the order in which the listeners are declared in the brokers' server.properties to properly aggregate per protocol.
On the broker side only - where multiple security protocol can coexist -
we should add a tag showing the listener label, that would also make it much easier to group the metrics in a tool like Grafana.
This KIP is co-authored with Mickael Maison
Public Interfaces
Monitoring
The metrics in the socket-server-metrics group will gain two new entries in their tags map, Security Protocol name and the Listener name
Proposed Changes
Example representation of the metrics in JConsole :
Current tag set : network processor id only
Initially proposed tag set : security protocol, listener name, network processor id
Eventually implemented tag set: listener name, network processor id
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
Users currently using reporters that do not support metric tags and flatten the tags map into the metrics name, will find that the name has changed - The Yammer metric "IdlePercent" should remain not tagged with listener, for compatibility; It's an alias to the Kafka metric "io-wait-ratio" that gets tagged
- The socket-server-metrics.* are not exposed as Yammer metrics which are still the most commonly consumed server-side metrics, so the impact of this change is mitigated (thanks Ismael Juma)
Rejected Alternatives
1) Having both listener and securityProtocol tags, in favor of having just the listener, as this reduces clutter
And in most simple cases, the two tags could end up having the same value
Proposed tag set : security protocol, listener name, network processor id
2) The listener tag could be omitted if it is the same as the protocol one :
rejected as Listener names and security protocols are different concepts (even if the default value is the same for compatibility reasons)
and it's a bit odd to expect generic tools that process Kafka metrics to have to fallback to the security protocol if listener name is missing.