You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »


Status

Current stateUnder Discussion

Discussion thread: TBD

JIRA: KAFKA-7641 - Getting issue details... STATUS KAFKA-7610 - Getting issue details... STATUS  

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Today group coordinator will take in unlimited number of join group requests into the membership metadata. There is a potential risk described in KAFKA-7610 - Getting issue details... STATUS  where too many illegal joining members will burst broker memory before session timeout GC them. To ensure stability of the broker, we propose to enforce a hard limit on the size of consumer group in order to prevent explosion of server side cache/memory.

Public Interfaces

We propose to add a new configuration into KafkaConfig.scala, and its behavior will affect the following coordinator APIs:

GroupCoordinator.scala
def handleJoinGroup(...)

where we shall enforce the group size capping rules upon requests.

Proposed Changes


We shall add a config called group.max.size on the coordinator side.

KafkaConfig
val GroupMaxSizeProp = "group.max.size"
...
val GroupMaxSize = 1000000
...
.define(GroupMaxSizeProp, INT, Defaults.GroupMaxSize, MEDIUM, GroupMaxSizeDoc)

The default value 1_000_000 proposed here is based on a rough size estimation of member metadata (120B), so the max allowed memory usage per group is 120B * 1_000_000 = 100 MB which should be sufficient large number of 5X~10X for most use cases I know. Further discussion is welcomed on defining the default value!

Implementation wise we shall block registration of new member once a group reaches its capacity, and define a new error type:

Errors.java
GROUP_MAX_SIZE_REACHED(77, "Consumer group is already at its full capacity.",
 GroupMaxSizeReachedException::new);

Since the cap should never be reached, the consumer would fail itself upon receiving this error message to reduce load on broker side because reaching capacity limit is a red flag indicating some client side logic bug and should be prohibited to ensure server stability.

Compatibility, Deprecation, and Migration Plan

  • This is backward compatible change.

Rejected Alternatives

Some discussion here proposed other approaches like enforcing memory limit or changing initial rebalance delay. We believe that those approaches are "either not strict or not intuitive" (Quote from Stanislav), compared with group size cap which is very easy to understand and config by end user in the customized manner.

  • No labels