You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Status

Current state: Discussion

Discussion thread:

JIRA:

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

While a reassignment is in progress, the number of replicas for a partition being reassigned temporarily increases beyond the replication factor. Once all new replicas are in the ISR, the old replicas are removed and the number of replicas again matches the replication factor. Until that point, however, the partition is treated as under-replicated both from the perspective of metrics and from the topic command utility. This is misleading because the partitions may satisfy the required replication factor throughout the reassignment. Furthermore, it obscures actual replication problems while a reassignment is in progress because some number of under-replicated partitions are expected. For example, this makes it difficult to use URPs for alerting. In this KIP, we propose to distinguish the URPs caused by reassignment.

Proposed Changes

We will distinguish "UnderSynchronized" partitions as those which have an in-sync replica set that is smaller than the topic's replication factor, and "OverReplicated" partitions as those which have more replicas than the replication factor.

The high level idea is that users can monitor the over-replicated partitions to track the progress of a reassignment. The under-synchronized partitions can be monitored separately for possible alerting.

Public Interfaces

We will add two new metrics exposed on the broker which represent counts of the new categories mentioned above: "UnderSynchronizedCount" and "OverReplicatedCount."

The topic command utility will have similar options to display the partitions in each category: --under-synchronized-partitions and --over-replicated-partitions.

Compatibility, Deprecation, and Migration Plan

These changes are backwards compatible.

Rejected Alternatives

We considered redefining "under-replicated partition" to exclude partitions being reassigned. Ultimately we were reluctant to change its semantics for compatibility with previous versions considering its broad usage.

  • No labels