Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state:  [Voting] Accepted

Discussion thread: TBD https://www.mail-archive.com/dev@kafka.apache.org/msg96520.html

JIRA:  

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-71906399

PRhttps://github.com/apache/kafka/pull/6509

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

When trouble shooting KAFKA-7190, one observation is that Streams' overridden topic configs of `segment.ms` and `segment.index.bytes` are too aggressive, and hence is causing various issues with applications that do not have high traffic via these repartition topics. Although the root cause of it should be tackled at KIP-360, I think it is still worth removing these two aggressive overrides and only keep the `segment.bytes` override to 50MB, which should be sufficient for bounding the repartition topic's footprint.

Proposed Changes

...

Streams previously used an "infinite" default max.poll.interval.ms Consumer config. The reasoning was that we didn't call poll() during restore, which can take arbitrarily long, so our maximum expected interval between poll calls was infinite. Since 1.0, we do call poll during restore, so we no longer need the infinite default, and setting a reasonable limit here can help to resolve situations in which a particular thread gets stuck for a while and Streams stops making progress.

Proposed Changes

We want to remove the override and instead fall back to the ConsumerConfig-defined default of five minutes.

Compatibility, Deprecation, and Migration Plan

This should not introduce much impact on users except slightly increased footprint on the repartition topic partitions, which are still bounded by `segment.bytes`, which is 50MB unless user-overridden to other values.

Rejected Alternatives

The only problem I foresee is that existing applications may currently take longer than five minutes between calls to poll in the steady state. Think: low-volume, but high-latency computations. These applications are leaning on the current Streams-defined default of "max int" millis. Upon updating Streams, they would start to see timeouts leading to rebalances if they don't override the max.poll.interval.ms config. The fix for them would be to set the config to something reasonable for their application, which would be a runtime fix.

Rejected Alternatives

In the ticket, we discussed even shorter defaults of 30s or 1m, but this would put even more applications at risk for spurious timeoutsNone.