Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Motivation
KIP-73 introduced a method of throttling replication traffic between brokers such that a predictable level of service can be maintained during data intensive administrative operations (e.g. rebalancing partitions). This method works by enabling replication throttling on partition-replicas via dynamic or static configuration at the topic level (leader.replication.throttled.replicas & follower.replication.throttled.replicas), and setting a throttle rate via dynamic configuration at the broker level (leader.replication.throttled.rate & follower.replication.throttled.rate). This works very well for planned operations, however, is not conducive with unplanned events that trigger a spike in replication traffic between brokers, for example a broker automatically rejoining the cluster after an extended period of downtime. This KIP intends to address this case by allowing the throttle rate variables (leader.replication.throttled.rate & follower.replication.throttled.rate) to be configured both dynamically and statically. With this improvement, an administrator can preemptively set a throttling rate to ensure that, in the event of a spike in replication traffic, there is still some bandwidth reserved for client traffic. In this case, the administrator would enable throttling on all partition-replicas using the wildcard '*' - alternatively, if KIP-1009 accepted, they could enable throttling at the broker level.
NOTE: Dynamic configurations will always override static configurations and therefore quotas applied through the kafka-reassign-partitions.sh
tool will temporarily override any static configuration.
Public Interfaces
The following values will be added as static broker configuration:
Name | Description | Type | Default | Valid Values | Importance | Update Mode |
---|---|---|---|---|---|---|
leader.replication.throttled.rate | A long representing the upper bound (bytes/sec) on replication traffic for leaders enumerated in the property "leader.replication.throttled.replicas" (for each topic). It is suggested that the limit be kept above 1MB/s for accurate behaviour. | long | 9223372036854775807 | [0,...] | low | per-broker |
follower.replication.throttled.rate | A long representing the upper bound (bytes/sec) on replication traffic for followers enumerated in the property "follower.replication.throttled.replicas" (for each topic). It is suggested that the limit be kept above 1MB/s for accurate behaviour. | long | 9223372036854775807 | [0,...] | low | per-broker |
The following metrics will be added to ReplicationQuotaManager:
Metric Name | Type | Group | Tags | Description | JMX Bean |
---|---|---|---|---|---|
LeaderReplicationThrottledPartitions | Gauge | ReplicationQuotaManager | - | The number of partitions on the broker that are being throttled on the leader side. | kafka.server:type=ReplicationQuotaManager,name=LeaderReplicationThrottledPartitions |
FollowerReplicationThrottledPartitions | Gauge | ReplicationQuotaManager | - | The number of partitions on the broker that are being throttled on the follower side. | kafka.server:type=ReplicationQuotaManager,name=FollowerReplicationThrottledPartitions |
Proposed Changes
Move the leader.replication.throttled.rate and follower.replication.throttled.rate configuration values from DynamicConfig to KafkaConfig.
Compatibility, Deprecation, and Migration Plan
This change is backwards compatible as these configuration values will still be configurable dynamically.
Test Plan
Testing for this change can be covered in unit tests.
Rejected Alternatives
The alternative is to keep these configurations as dynamic only.