Status

Current state: "Accepted"

Discussion thread: here

Vote thread: here

JIRA: KAFKA-17876 - Getting issue details... STATUS KAFKA-19150 - Getting issue details... STATUS

Motivation

Kafka metrics have traditionally followed a consistent naming convention using the kafka.<COMPONENT> format. However, some inconsistencies were unintentionally introduced during recent refactorings:

  • The metrics under org.apache.kafka.server:type=AssignmentsManager were originally named using the kafka.<COMPONENT> format. This changed unexpectedly when classes were moved from the kafka.server package to org.apache.kafka.server as part of this commit.
  • Similarly, the org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool metric was initially intended to follow the same convention. However, an incorrect metric name was introduced during initialization in a later patch.

These changes went unnoticed, and subsequent users began using the new org.apache.kafka.<COMPONENT> pattern, diverging from the established convention.

This KIP proposes to standardize Kafka metric naming by reverting the following metrics to the correct kafka.<COMPONENT> format to ensure consistency across the system:

  • org.apache.kafka.server:type=AssignmentsManagerkafka.server:type=AssignmentsManager

  • org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPoolkafka.log.remote:type=RemoteStorageThreadPool

This change will align these metrics with the broader Kafka metrics ecosystem and avoid confusion for users and tooling that rely on consistent naming.

Public Interface

  • org.apache.kafka.storage.internals.log.RemoteStorageThreadPool 
  • org.apache.kafka.server.log.remote.storage.RemoteStorageMetrics 
  • org.apache.kafka.server.AssignmentsManager

Proposed Changes

Introduce new metrics, consistent with other Kafka metrics.

This change will impact external monitoring systems that rely on the old metric name; corresponding updates will be required.

Deprecate the following metrics

  • org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool.RemoteLogReaderTaskQueueSize
  • org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool.RemoteLogReaderAvgIdlePercent
  • org.apache.kafka.server:type=AssignmentsManager.QueuedReplicaToDirAssignments

Introduce following new metric: 

  • kafka.log.remote:type=RemoteStorageThreadPool.RemoteLogReaderAvgIdlePercent
  • kafka.log.remote:type=RemoteStorageThreadPool.RemoteLogReaderTaskQueueSize
  • kafka.server:type=AssignmentsManager.QueuedReplicaToDirAssignments

This change will impact external monitoring systems that rely on the old metric name; corresponding updates will be required.

Due to 

Compatibility, Deprecation, and Migration Plan

  1. We will add a deprecated annotation on following constant
    1. `RemoteStorageMetrics#REMOTE_LOG_READER_TASK_QUEUE_SIZE_METRIC`
    2. `RemoteStorageMetrics#REMOTE_LOG_READER_AVG_IDLE_PERCENT_METRIC`
    3. `AssignmentsManager#QUEUED_REPLICA_TO_DIR_ASSIGNMENTS_METRIC`
  2. Update the Kafka documentation to note MBean will be replaced
    1. `org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool.RemoteLogReaderTaskQueueSize`
    2. `org.apache.kafka.storage.internals.log:type=RemoteStorageThreadPool.RemoteLogReaderAvgIdlePercent`
    3. `org.apache.kafka.server:type=AssignmentsManager.QueuedReplicaToDirAssignments`
  3. Register folowing new metrics
    1. `kafka.log.remote:type=RemoteStorageThreadPool.RemoteLogReaderTaskQueueSize`
    2. `kafka.log.remote:type=RemoteStorageThreadPool.RemoteLogReaderAvgIdlePercent`
    3. `kafka.server:type=AssignmentsManager.QueuedReplicaToDirAssignments`
  4. Delete following all usage in code base at the Kafka 5.0
    1. `RemoteStorageMetrics#REMOTE_LOG_READER_TASK_QUEUE_SIZE_METRIC`
    2. `RemoteStorageMetrics#REMOTE_LOG_READER_AVG_IDLE_PERCENT_METRIC`
    3. `AssignmentsManager#QUEUED_REPLICA_TO_DIR_ASSIGNMENTS_METRIC`

Test Plan

  • Add new tests to verify the new metric is correctly registered and behaves as expected.
  • Ensure backward compatibility by verifying deprecated metric behavior until removal.

Rejected Alternatives

n/a

  • No labels