Current state: Accepted
Discussion thread: Discussion
JIRA: KAFKA-5890 - records.lag should use tags for topic and partition rather than using metric name.
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
As part of KIP-92 a per-partition lag metric was added.
These metrics are really useful, however it was implemented as a prefix to the metric name: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L1321-L1344
Usually these kind of metrics use tags (as provided by org.apache.kafka.common.MetricName), and the name is constant for all topics and partitions.
This is especially important for users who use custom reporters which aggregate topics/partitions together to avoid an explosion of the number of KPIs.
Change the metrics with metrics group name “consumer-fetch-manager-metrics” from: “<topic>-<partition>.records-lag” with only inherited tags to “records-lag” and the new tags: topic and partition.
Once this change is done this metrics used in the consumer will have 3 tags: client-id, topic and partition which would make it more coherent with the rest of the metrics.
In org.apache.kafka.clients.consumer.internals.Fetcher, the sensor will have a set of tags attached and will use a constant name for its metric instead of a generated one.
Sensors will still be removed/added on assignment change to avoid reporting metrics for partitions we are not handling anymore.
This will enable people to do:
The new options is more standard with the rest of the existing metrics.
Compatibility, Deprecation, and Migration Plan
- We will add these new metrics and deprecate the previous ones. The old metrics will be removed once we move to 2.0 (Tracked by KAFKA-6445)