This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress
Skip to end of metadata
Go to start of metadata

Status

Current state: "Accepted"

Discussion thread: https://www.mail-archive.com/dev@kafka.apache.org/msg77721.html

JIRA: KAFKA-5565

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Consumer group rebalancing may impact the performance of clients.  The rebalancing process may also sometimes take longer than expected. It would be good to have some metrics which provide visibility into how many rebalances are in progress.

Public Interface Additions and Changes

The group state name "AwaitingSync" is a bit confusing.  It is part of rebalancing, but it does not have "Rebalancing" in the name.  We propose renaming this state to "CompletingRebalance", to reflect the fact that it is the final part of the rebalancing operation.

Then we will add metrics identifying how many consumer groups are in each state.

  • NumGroupsPreparingRebalance: the number of consumer groups which are currently in the PreparingRebalance state.
  • NumGroupsCompletingRebalance: the number of consumer groups which are currently in the CompletingRebalance state.
  • NumGroupsStable: the number of groups which are currently in the Stable state.
  • NumGroupsDead: the number of groups which are currently in the Dead state.
  • NumGroupsEmpty: the number of groups which are currently in the Empty state.

In combination with the existing NumGroups metric, this will show what percentage of groups are in a particular state at a given time.

Compatibility, Deprecation, and Migration Plan

None

Rejected Alternatives

Instead of adding a metric, we could look through the broker logs to see when consumer group rebalances begin and end.  However, this would be more difficult for metrics monitoring systems to track, since they would have to parse the broker logs.

Another option would be to provide more information about groups through the AdminClient. While this would be useful, it doesn't serve exactly the same function of giving a summary of what is going on which a metric does.

  • No labels