Table of Contents |
---|
Status
Current state: Under DiscussionAccepted
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
Jira | ||||||
---|---|---|---|---|---|---|
|
Jira | ||||||
---|---|---|---|---|---|---|
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
The POC implementation of the proposed metrics can be found here: https://github.com/apache/kafka/pull/12391
Metric Name | Level | Type | Description | Notes | |||
---|---|---|---|---|---|---|---|
active-restoring-tasks | thread / INFO | count | The number of active tasks currently undergoing restoration | ||||
standby-updating-tasks | thread / INFO | count | The number of active tasks currently undergoing updating | ||||
active-paused-tasks | thread / INFO | count | The number of active tasks paused restoring | ||||
standby-paused-tasks | thread / INFO | count | The number of standby tasks paused updating | ||||
idle-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on being idle | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | |||
active-restore-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on restoring active | or standbytasks | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | ; only one of the restore/update-ratio should be non-zero | |
standby-update | checkpoint-ratio | thread / INFO | gauge (percentage) | The fraction of time the thread spent on | checkpointing restored progressupdating standby tasks | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | ; only one of the restore/update-ratio should be non-zero |
checkpoint-ratio | active-restore-records-ratethread / INFO | rategauge (percentage) | The fraction of time the thread spent on checkpointing restored progress | idle-ratio + restore/update-ratio + checkpoint-ratio should be 1 | |||
restore | average per-second number of records restored for all active tasksmin(active-restore-records-rate, standby-update-records-rate) == 0 | standby-update-records-rate | thread / INFO | rate | The average per-second number of records restored/updated for all | standbytasks | min(active-|
restore | -records- | rate, standby-update-records-rate) == 0restore-call-rate | thread / INFO | rate | The average per-second number of restore calls triggered | ||
restore-total | task / DEBUG | count | The total number of records processed during restoration for active task | the metric would persist even when the task completed restoration, and would be removed only when the task is removed from the thread. | |||
restore-rate | task / DEBUG | rate | The average per-second number of records restored for active task | the metric would drop to zero when the task completed restoration, and would be removed only when the task is removed from the thread. | |||
update-total | task / DEBUG | count | The total number of records updated for standby task | same as above | |||
update-rate | task / DEBUG | rate | The average per-second number of records updated for standby task | same as above | |||
restore-remaining-records-total | task / INFO | count | The number of records remained to be restored for active tasks |
Along with these new metrics, we would also deprecate the metrics below:
...