|Table of Contents
Current state: Under Discussion
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
The KafkaConsumer API centers around the poll() API which is intended to be called in a loop. On every iteration of the loop, poll() returns a batch of records from the partitions this consumer can retrieve at that time. The size of returned records is determined by the max.poll.records, as described in KIP-41: KafkaConsumer Max Records. Currently the implementation will return available records starting from the last partition the last poll call retrieves records from. This leads to unfair patterns of record consumption from multiple partitions.
This proposal discusses a mechanism to mitigate that issue.
No public interface changes is proposed.
The issue stems from the greedy consumption of a partition in serving a poll call, as described in Ensuring Fair Consumption of KIP-41, to be used again in the next poll call, and so continue that greedy behavior against that previous partition in the next call.