...
Metric tags are:
- type=stream-state-updater-metrics
- client-id=[clientId]
- thread-id=[threadId]
Recording level is: INFO
...
The POC implementation of the proposed metrics can be found here: https://github.com/apache/kafka/pull/12391
Metric Name | Type | Description | Notes |
---|
active-restoring |
---|
-active-tasks | count | The number of active tasks currently undergoing restoration |
---|
restoringstandbyupdating-tasks | count | The number of active tasks currently undergoing |
---|
restorationactive-tasks | count | The number of active tasks paused restoring |
|
---|
standby-paused |
---|
-standby-tasks | count | The number of standby tasks paused |
---|
restoringupdating |
|
idle-ratio | gauge (percentage) | The fraction of time the thread spent on being idle | idle-ratio + restore-ratio + checkpoint-ratio should be 1 |
---|
restore-ratio | gauge (percentage) | The fraction of time the thread spent on restoring active or standby tasks | idle-ratio + restore-ratio + checkpoint-ratio should be 1 |
---|
checkpoint-ratio | gauge (percentage) | The fraction of time the thread spent on checkpointing restored progress | idle-ratio + restore-ratio + checkpoint-ratio should be 1 |
---|
restoreactive-records-restored-total | count | The total number of records restored |
---|
restore-records-for active tasks | it is for the lifetime of the streams app, hence ever going |
standby-records-updated-total | count | The total number of records updated for active tasks | it is for the lifetime of the streams app, hence ever going |
---|
active-records-remaining | count | The number of records remained to be restored | it should be usually declining, and during rebalance it may be jumping up or down |
---|
standby-records-remaining | count | The number of records remained to be updated | it could be usually increasing or declining, and during rebalance it may be jumping up or down |
---|
records-restored-rate | rate | The average per-second number of records restored for active or updated for standby | it counts for both active and standby tasks |
---|
restore-call-rate | rate | The average per-second number of restore calls triggered |
|
---|
Along with these new metrics, we would also deprecate the metrics below:
...
Code Block |
---|
|
public interface StateRestoreListener {
void onRestoreStart(final TopicPartition topicPartition,
final String storeName,
final long startingOffset,
final long endingOffset);
void onRestoreEnd(final TopicPartition topicPartition,
final String storeName,
final long totalRestored);
...
/**
* NEW FUNC. Method called when restoring the {@link StateStore} is pausedsuspended due to the task being suspended from the host.
* If the task was resumed after suspension and restoration continues, another {@link onRestoreStart} would be called.
*/
default void onRestorePausedonRestoreSuspended(final TopicPartition topicPartition,
final String storeName,
final long totalRestored) {
// do nothing
}
} |
...