Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/7oprocltq9f94x0c7761bhzs85t8b0jv

JIRA: KAFKA-20180 - Getting issue details... STATUS

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, there exists only one metric on the broker that enables visibility into the broker's ability to successfully fetch and apply metadata from the cluster metadata partition. This metric is last-applied-record-lag-ms, which reports the difference between the local system time and the timestamp of the last record from the cluster metadata partition that was applied by the broker.

The main issue with this metric is that its value is the difference between a broker's local system time at the time of metric collection and the leader's append timestamp of the broker's last applied record. This means the frequency/interval of metrics collection determines this metric's value, which may present challenges for monitoring the metric.

Instead, having a latency metric would be more intuitive for operators to monitor and alert on. The value of this metric would be the difference between when the broker applied its most recent metadata image locally and the leader's append timestamp for that metadata image. 

Public Interfaces

Monitoring

We will introduce a histogram metric to capture the p50, p99, p999, and the maximum latency for a broker's last applied image latency.

NameType
kafka.server:type=broker-metadata-metrics,name=last-applied-image-latency-ms-50percentileHistogram
kafka.server:type=broker-metadata-metrics,name=last-applied-image-latency-ms-99percentileHistogram
kafka.server:type=broker-metadata-metrics,name=last-applied-image-latency-ms-999percentileHistogram
kafka.server:type=broker-metadata-metrics,name=last-applied-image-latency-ms-maxHistogram

Compatibility, Deprecation, and Migration Plan

This KIP only introduces new metrics.

Test Plan

Unit and integration tests for the newly added metrics.

  • No labels

2 Comments

  1. Kevin Wu hi, just want to let you know that KIP-1279 is already taken by Cluster Mirroring proposal and the discussion in ongoing.

    1. Got it. Will update the KIP number.