Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add example

...

The general idea behind these two rules is to keep the most flexible assignment choices available as long as possible by starting with the most constrained partitions / consumers.

Example

The following example demonstrates how this new algorithm could be beneficial.

Given the following inventory of five topics,

TopicPartitions
T12
T21
T32
T41
T52

for a total of eight partitions, and a consumer group consisting of four members (C1, C2, C3, C4) with the following topic subscriptions,

ConsumerTopics
C1T1, T2, T3, T4, T5
C2T1, T3, T5
C3T1, T3, T5
C4T1, T2, T3, T4, T5

Note that this sort of situation could plausibly occur after an initial state of all consumers subscribing to (T1, T3, T5) in the window of time from when the configuration for C1 and C4 has been refreshed to consume the two additional topics, until C2 and C3 are eventually updated some time later.

here are the resulting assignments for each of the assignment strategies,

Range

ConsumerAssigned Partitions
C1T1-0, T2-0, T3-0, T4-0, T5-0
C2T1-1, T3-1, T5-1
C3 
C4 

Round Robin

ConsumerAssigned Partitions
C1T1-0, T3-0, T5-0
C2T1-1, T3-1, T5-1 
C3 
C4T2-0, T4-0 

Fair

Consumer

Assigned Partitions

C1T2-0, T3-0
C2T1-0, T3-1
C3T1-1, T5-0 
C4T4-0, T5-1 

Compatibility, Deprecation, and Migration Plan

...

For the new consumer a custom implementation class can potentially be configured rather than adding a third built-in option into Kafka. As the logic is somewhat complicated to get correct, it likely makes sense for Kafka to provide this as a convenience.

It has been noted that with heavily skewed subscriptions, fairness is sort of moot – i.e., people would generally scale up or down subscription counts with the express purpose of reducing/increasing load on those instances. While a valid point, that would not be true in the case of rolling configuration updates as described above.