Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateUnder Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here

Motivation

Kafka consumer pipelines the fetching of data in order to maximise performance. Whenever poll(Duration)/poll(long) is called before any results is returned, another fetch is issued. Albeit benefitting performance, in some circumstances when combined with the use of the pause/resume API, this optimisation can result in transferring quite a bit of duplicate data over the wire. The reason for this to happen is that whenever poll is called any prefetched data is thrown away in case the topic-partition is paused. To illustrate the effect with a simple example, imagine that a single KafkaConsumer instance is assigned two topic partitions TP1 and TP2. Since the client interested in TP1 cannot handle records as fast than the one in TP2, we resort to pausing TP1 whenever we are not interested in receiving records for it. This results in the following behavior:

...