Status
Current state: Under Discussion
Discussion thread: here
JIRA:
-
KAFKA-17220Getting issue details...
STATUS
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
MM2 has metrics about record-bytes, record-age, record-count, replication-latency, and checkpoint-latency. However, it lacks observability for high level metrics like topic-count, consumer-group-count, etc.
Proposed Changes
We are proposing to add the below metrics to the MM2.
Public Interfaces
New Metrics
Metric Name | Data Type | Description | MBean |
---|---|---|---|
topic-count | int | The current number of topics for synchronization. | kafka.mirror.connect:type=MirrorSourceConnector,source={source} |
partition-count | int | The current number of partitions per topic for synchronization. | kafka.mirror.connect:type=MirrorSourceConnector,source={source},topic={topic} |
consumer-group-count | int | The current number of consumer group for synchronization. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source} |
consumer-group-offset | long | The latest synchronized offset of consumer group. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},group={group} |
consumer-group-offset-translation-lag | long | The difference upstream offset between offset sync and consumer group. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},group={group} |
consumer-group-offset-translation-error-count | int | The number of error offset translation. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},group={group} |
checkpoint-task-startup-time-avg | long | The average time, in milliseconds, that the checkpoint task takes from start to stop. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source} |
checkpoint-task-startup-time-max | long | The maximum time, in milliseconds, that the checkpoint task takes from start to stop. | kafka.mirror.connect:type=MirrorCheckpointConnector,source={source} |
background-job-success-count | int | The number of background job success execution. | kafka.mirror.connect:type=MirrorSourceConnector,source={source},job={job} kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},job={job} |
background-job-failure-count | int | The number of background job failure execution. | kafka.mirror.connect:type=MirrorSourceConnector,source={source},job={job} kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},job={job} |
background-task-execution-time-avg | long | The duration of background job execution. | kafka.mirror.connect:type=MirrorSourceConnector,source={source},job={job} kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},job={job} |
background-task-execution-time-max | long | The duration of background job execution. | kafka.mirror.connect:type=MirrorSourceConnector,source={source},job={job} kafka.mirror.connect:type=MirrorCheckpointConnector,source={source},job={job} |
Compatibility, Deprecation, and Migration Plan
Since only new metrics are being proposed, there should be no compatibility issues.
Test Plan
Tests will be written for each of the new metrics managers to ensure measurement are correctly taken.
Rejected Alternatives
None.