...
Current state: Under Discussion
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here
Motivation
Kafka consumer pipelines the fetching of data in order to maximise performance. Whenever poll(Duration)/poll(long)
is called before any results is returned, another fetch is issued. Albeit benefitting performance, in some circumstances when combined with the use of the pause/resume API
, this optimisation can result in transferring quite a bit of duplicate data over the wire. The reason for this to happen is that whenever poll
is called any prefetched data is thrown away in case the topic-partition is paused. To illustrate the effect with a simple example, imagine that a single KafkaConsumer
instance is assigned two topic partitions TP1
and TP2
. Since the client interested in TP1
cannot handle records as fast than the one in TP2
, we resort to pausing TP1
whenever we are not interested in receiving records for it. This results in the following behavior:
...