Current state: Adopted
Discussion thread: TBD
KAFKA-7824Getting issue details...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
In the current consumer protocol, the field `member.id` is assigned by broker to track group member status. A new consumer joins the group with `member.id` field set as UNKNOWN_MEMBER_ID (empty string), since it needs to receive the identity assignment from broker first. For request with unknown member id, broker will blindly accept the new join group request, store the member metadata and return a UUID to consumer. The edge case is that if initial join group request keeps failing due to connection timeout, or the consumer keeps restarting, or the max.poll.interval.ms configured on client is set to infinite (no rebalance timeout kicking in to clean up the member metadata map), there will be accumulated MemberMetadata info within group metadata cache which will eventually burst broker memory. The detection and fencing of invalid join group request is crucial for broker stability.
This KIP is a parallel work with KIP-389 which tries to enforce hard cap on the group metadata size, and an important complement for KIP-345 which introduces static membership.
We will introduce a new join group error type called MEMBER_ID_REQUIRED which will be triggered when broker meets join group request with unknown member id:
MEMBER_ID_REQUIRED(79, "Consumer needs to have a valid member id before actually entering group", MemeberIdRequiredException::new),
We shall also bump join group protocol version so that broker knows whether the consumer could safely handle this type of error. For example if we bump protocol version from m to m+1, all the request with version >= m+1 will be returned with MEMBER_ID_REQUIRED error, while version <= m will still be blindly accepted for backward compatibility.
When encountering MEMBER_ID_REQUIRED exception, the client will use the given member id in the join group response to retry the join, which is expected to be accepted by the broker if id matches. If we encounter UNKNOWN_MEMBER_ID exception with the second join attempt, client handling logic will be the same, which is reseting the generation and ask a new member id from broker by sending anonymous join group request. We also handle the registered member id eviction through session timeout so that the pre-allocation map will not grow indefinitely, although the map size should be trivial since we only store a random generated id.
Effectively speaking, previously we accept anonymous member joining as new member, and now we require one more bounce to justify new member identity.
Compatibility, Deprecation, and Migration Plan
- This is a pure broker upgrade which should be transparent to the client users. Impact should be minimum.
- No compatibility issue identified.