Table of Contents

Status

Current state: Under DiscussionAccepted

Discussion thread: Thread

JIRA:

Jira

server	ASF JIRA
columnIds	issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-1441217411

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Name	RecordingLevel	Metric Type	Description
`commit-rate`	`DEBUG`	Rate	The average number of calls to `StateStore#commit(Map)`.commit per second
`commit-latency-avg`	`DEBUG`	Avg	The average time taken to call `StateStore#commit(Map)`.latency of calls to `commit`
`commit-latency-max`	`DEBUG`	Max	The maximum time taken to call `StateStore#commit(Map)`.latency of calls to `commit`

stream-state-metrics
- flush-rate
- flush-latency-avg
- flush-latency-max

...

On start-up, we will construct and initialize state stores for every corresponding Task store directory we find on-disk, and call committedOffset to determine their stored offset(s). We will then cache these offsets in-memory and close() these stores. This cache will be shared among all StreamThreads, and will therefore be thread-safe.
On rebalance, we will consult this cache to compute our Task offset lags.
After state restore completes for an assigned Task, we will update the offset cache to Task.LATEST_OFFSET, to indicate that the Task now has the currently latest offset.
When closing a StateStore, we will update the offset cache with the current changelog offset for the store.
- This will ensure that when a Task is reassigned to another instance, the Task lag for the local state matches what's on-disk, instead of using the sentinel value Task.LATEST_OFFSET, which is only valid for Tasks currently assigned to a thread on the local instance.

...

Compatibility, Deprecation, and Migration Plan

...