Table of Contents |
---|
Status
Current state: Under DiscussionAdopted
Discussion thread: TBD
JIRA: https://issues.apache.org/jira/browse/KAFKA-7610
Jira | ||||||
---|---|---|---|---|---|---|
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
In the current consumer protocol, the field `member.id` is assigned by broker to track group member status. A new consumer joins the group with `member.id` field set as UNKNOWN_MEMBER_ID (empty string), since it needs to receive the identity assignment from broker first. For request with unknown member id, broker will blindly accept the new join group request, store the member metadata and return a UUID to consumer. The edge case is that if initial join group request failskeeps failing due to connection timeout, or the consumer keeps restarting, or the max.poll.interval.ms configured on client is set to infinite (no rebalance timeout kicking in to clean up the member metadata map), there will be many leftover MemberMetadata accumulated MemberMetadata info within group metadata cache which will eventually burst broker memory. The detection and fencing of invalid join group request is crucial for broker stability.
This KIP is a parallel work with KIP-389 which tries to enforce hard cap on the group metadata size, and an important compliment for complement for KIP-345 which introduces static membership.
Public Interfaces
We will introduce a new join group error type called MEMBER_ID_REQUIRED which will be triggered when broker meets join group request with unknown member id:
Code Block | ||||
---|---|---|---|---|
| ||||
MEMBER_ID_REQUIRED(79, "Consumer needs to have a valid member id before actually entering group", MemeberIdRequiredException::new), |
We shall also bump join group protocol version so that broker knows whether the consumer could safely handle this type of error. For example if we bump protocol version from m to m+1, all the request with version >= m+1 will be returned with MEMBER_ID_REQUIRED error, while version <= m will still be blindly accepted for backward compatibility.
Proposed Changes
When encountering MEMBER_ID_REQUIRED exception, the client will use the given member id in the join group response to retry the join, which is expected to be accepted by the broker if id matches. If we encounter UNKNOWN_MEMBER_ID exception with the second join attempt, client handling logic will be the same, which is reseting the generation and ask a new member id from broker by sending anonymous join group request. We also handle the registered member id eviction through session timeout so that the pre-allocation map will not grow indefinitely, although the map size should be trivial since we only store a random generated id.
Effectively speaking, previously we accept anonymous member joining as new member, and now we require one more bounce to justify new member identity.
Compatibility, Deprecation, and Migration Plan
- This is a pure broker upgrade which should be transparent to the client users. Impact should be minimum.
- No compatibility issue identified.
Rejected Alternatives
Jason proposed another approach to monitor the TCP connection. As he described, "During the initial JoinGroup, we can detect failed members when the TCP connection fails. This is difficult at the moment because we do not have a mechanism to propagate disconnects from the network layer. A potential option is to treat the disconnect as just another type of request and pass it to the handlers through the request queue." It is still under discussion and we believe that KIP-394 is a more intuitive solution.N/A