Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Motivation

KIP-73 introduced a method of throttling replication traffic between brokers such that a predictable level of service can be maintained during data intensive administrative operations (e.g. rebalancing partitions). This method works by enabling replication throttling on partition-replicas via dynamic or static configuration at the topic level (leader.replication.throttled.replicas & follower.replication.throttled.replicas), and setting a throttle rate via dynamic configuration at the broker level (leader.replication.throttled.rate & follower.replication.throttled.rate). This works very well for planned operations, however, is not conducive with unplanned events that trigger a spike in replication traffic between brokers, for example a broker automatically rejoining the cluster after an extended period of downtime. This KIP intends to address this case by allowing the throttle rate variables (leader.replication.throttled.rate & follower.replication.throttled.rate) to be configured both dynamically and statically. With this improvement, an administrator can preemptively set a throttling rate to ensure that, in the event of a spike in replication traffic, there is still some bandwidth reserved for client traffic. In this case, the administrator would enable throttling on all partition-replicas using the wildcard '*' - alternatively, if KIP-1009 accepted, they could enable throttling at the broker level.

NOTE: Dynamic configurations will always override static configurations and therefore quotas applied through the kafka-reassign-partitions.sh tool will temporarily override any static configuration.

Public Interfaces

The following values will be added as static broker configuration:

NameDescriptionTypeDefaultValid ValuesImportanceUpdate Mode
leader.replication.throttled.rateA long representing the upper bound (bytes/sec) on replication traffic for leaders enumerated in the property "leader.replication.throttled.replicas" (for each topic). It is suggested that the limit be kept above 1MB/s for accurate behaviour.long9223372036854775807[0,...]low

per-broker

follower.replication.throttled.rateA long representing the upper bound (bytes/sec) on replication traffic for followers enumerated in the property "follower.replication.throttled.replicas" (for each topic). It is suggested that the limit be kept above 1MB/s for accurate behaviour. long9223372036854775807[0,...]low

per-broker

The following metrics will be added to ReplicationQuotaManager:

Metric NameTypeGroupTagsDescriptionJMX Bean
LeaderReplicationThrottledPartitionsGaugeReplicationQuotaManager-The number of partitions on the broker that are being throttled on the leader side.

kafka.server:type=ReplicationQuotaManager,name=LeaderReplicationThrottledPartitions

FollowerReplicationThrottledPartitionsGaugeReplicationQuotaManager-The number of partitions on the broker that are being throttled on the follower side.

kafka.server:type=ReplicationQuotaManager,name=FollowerReplicationThrottledPartitions

Proposed Changes

Move the leader.replication.throttled.rate and follower.replication.throttled.rate configuration values from DynamicConfig to KafkaConfig.

Compatibility, Deprecation, and Migration Plan

This change is backwards compatible as these configuration values will still be configurable dynamically.

Test Plan

Testing for this change can be covered in unit tests.

Rejected Alternatives

The alternative is to keep these configurations as dynamic only.

  • No labels