Status

Current state"Under Discussion"

Discussion thread: <TBA>

JIRA: https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-20461

Motivation

offsets.retention.minutes controls how long committed consumer group offsets are retained after a group goes empty or
 stops committing to a topic. The default is 7 days (10080 minutes). There is currently no way to disable this expiry —
 only finite positive values are accepted.

This creates a reliability gap: if an application crashes and is not noticed for longer than the retention window, the
 consumer group is deleted. When the application is restarted it must fall back to auto.offset.reset (typically latest
 or earliest), potentially skipping or reprocessing a large amount of data.

Topic-level retention.ms already supports -1 to mean "retain forever". offsets.retention.minutes should offer the same
 option for consistency and for operators who want guaranteed offset durability regardless of inactivity duration.

Public Interfaces

One broker configuration change:

ConfigBeforeAfter
offsets.retention.minutesAccepts integers ≥ 1Accepts -1 (infinite) or integers ≥ 1

  No new configs, no protocol changes, no API changes.

Proposed Changes

  1. GroupCoordinatorConfig — replace the atLeast(1) validator with a custom validator that accepts -1 or any positive
      integer. The doc string is updated to document -1. When the stored offsetsRetentionMs is computed, -1 maps to -1L
      directly (instead of multiplying by 60,000).
  2. OffsetExpirationConditionImpl.isOffsetExpired() — add an early-return guard: if offsetsRetentionMs < 0, return
      false. This is the same pattern used in UnifiedLog.deleteRetentionMsBreachedSegments() for topic-level retention.

  Zero is explicitly excluded by the custom validator (as it was before this change) because 0ms retention would expire
  all offsets on the very next cleanup cycle, which has no practical use case and would be a footgun.

Compatibility, Deprecation, and Migration Plan

  • Fully backwards compatible. The default remains 10080 (7 days). Existing deployments are unaffected.
  • No deprecations.
  • Operators who want infinite retention set offsets.retention.minutes=-1 in server.properties (or the equivalent KRaft/dynamic config).

Test Plan

  • Unit test in GroupCoordinatorConfigTest: verify -1 is accepted and stored as -1L; verify 0 and -2 are rejected.
  • Unit test in OffsetExpirationConditionImplTest: verify that with offsetsRetentionMs=-1, isOffsetExpired returns false even when currentTimestampMs is Long.MAX_VALUE.

Rejected Alternatives

There is a "hack" to achieve a similar objective which is to set the retention time to some very large positive value. There are many problems with this. I would assume everyone can agree this is not a proper engineering solution and should not be relied on.

  • No labels