Table of Contents |
---|
Status
Current state: Under DiscussionAccepted
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
Jira | ||||||
---|---|---|---|---|---|---|
|
Jira | ||||||
---|---|---|---|---|---|---|
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
- type=stream-state-updater-metrics
- client-id=[clientId]
- thread-id=[threadId]
Task-level metric tags are:
- type=stream-task-metrics
- client-id=[clientId]
- thread-id=[threadId]
- task-id=[taskId]
...
The POC implementation of the proposed metrics can be found here: https://github.com/apache/kafka/pull/12391
Metric Name | Level | Type | Description | Notes | |||
---|---|---|---|---|---|---|---|
active-restoring-tasks | thread / INFO | count | The number of active tasks currently undergoing restoration | ||||
standby-updating-tasks | thread / INFO | count | The number of active tasks currently undergoing updating | ||||
active-paused-tasks | thread / INFO | count | The number of active tasks paused restoring | ||||
standby-paused-tasks | thread / INFO | count | The number of standby tasks paused updating | ||||
idle-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on being idle | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | |||
active-restore-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on restoring active | or standbytasks | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | ; only one of the restore/update-ratio should be non-zero | |
standby-update | checkpoint-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on | checkpointing restored progressupdating standby tasks | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | ; only one of the restore/update-ratio should be non-zero |
checkpoint-ratio | active-restore-records-ratethread / INFO | rategauge (percentage) | The fraction of time the thread spent on checkpointing restored progress | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | |||
restore- | average per-second number of records restored for all active tasksmin(active-restore-records-rate, standby-update-records-rate) == 0 | standby-update-records-rate | thread / INFO | rate | The average per-second number of records restored/updated for all | standbytasks | min(active-|
restore- | records-rate, standby-update-records-rate) == 0restore-call-rate | thread / INFO | rate | The average per-second number of restore calls triggered | |||
restore-total | task / DEBUG | count | The total number of records processed during restoration for active task | the metric would persist even when the task completed restoration, and would be removed only when the task is removed from the thread. | |||
restore-rate | task / DEBUG | rate | The average per-second number of records restored for active task | the metric would drop to zero when the task completed restoration, and would be removed only when the task is removed from the thread. | |||
update-total | task / DEBUG | count | The total number of records updated for standby task | same as above | |||
update-rate | task / DEBUG | rate | The average per-second number of records updated for standby task | same as above | |||
restore-remaining-records-total | task / INFO | count | The number of records remained to be restored for active tasks |
Along with these new metrics, we would also deprecate the metrics below:
...