Status

Current state: "Voting"

Vote thread: here

Discussion thread: here

JIRA: https://issues.apache.org/jira/browse/KAFKA-18775

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, when using MetadataQuorumCommand to add a controller, users must provide a controller.properties configuration file. This file is required for the command to retrieve the metadata local path and endpoints needed to add voters. However, this approach has several limitations:

  1. Limited Accessibility: The node executing the tool must have direct access to the metadata path of the node being added or removed. This restricts the ability to use node A to manage node B, as node A may not have access to the metadata folder on node B.
  2. Dependency on Node Configuration: The tool requires access to the configuration of the node being managed.

However, the essential information for these operations — the directory UUID and endpoints — is already available from the active controller’s in-memory state and the ClusterImage.

Leveraging these sources allows us to simplify voter addition and removal, enabling the command to run without direct access to the target node’s metadata directory.

Public Interfaces

CLI

Adding a controller

For adding a controller, introduces a new option —-controller-id for the add-controller subcommand.

Add a new controller with bootstrap server
bin/kafka-metadata-quorum.sh --bootstrap-server localhost:9092 add-controller --controller-id <id>
Add a new controller with bootstrap controller
bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:9093 add-controller --controller-id <id>

Removing a controller

For removing a controller, the —-controller_directory_id option is no longer required.

Remove a controller with bootstrap server
bin/kafka-metadata-quorum.sh --bootstrap-server localhost:9092 remove-controller --controller-id <id>

Remove a controller with bootstrap controller
bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:9093 remove-controller --controller-id <id>

Public APIs

Admin.java

Admin.java
/**
 * Add a new voter node to the KRaft metadata quorum.
 * 
 * Note that this is a convenient method and not idempotent. 
 * For a complicated scenario, e.g., Node Disk Failure, there might have  
 * observers with different directory uuid but the same node id. 
 * In this scenario, please go with {@link #addRaftVoter(int, Uuid, Set)}.
 *
 * @param voterId           The node ID of the voter.
 */
default AddRaftVoterResult addRaftVoter(int voterId) {           
    return addRaftVoter(voterId, Uuid.ZERO_UUID, Set.of(), new AddRaftVoterOptions());
}

/**
 * Remove a voter node from the KRaft metadata quorum.
 *
 * @param voterId           The node ID of the voter.
 */
default RemoveRaftVoterResult removeRaftVoter(int voterId) {
    return removeRaftVoter(voterId, Uuid.ZERO_UUID, new RemoveRaftVoterOptions());
}

RPC Changes

AddRaftVoterRequest.json

diff --git a/clients/src/main/resources/common/message/AddRaftVoterRequest.json b/clients/src/main/resources/common/message/AddRaftVoterRequest.json
index 74b7638ea2..27a6e5face 100644
--- a/clients/src/main/resources/common/message/AddRaftVoterRequest.json
+++ b/clients/src/main/resources/common/message/AddRaftVoterRequest.json
@@ -18,7 +18,7 @@
   "type": "request",
   "listeners": ["controller", "broker"],
   "name": "AddRaftVoterRequest",
-  "validVersions": "0-1",
+  "validVersions": "0-2",
   "flexibleVersions": "0+",
   "fields": [
     { "name": "ClusterId", "type": "string", "versions": "0+", "nullableVersions": "0+",

RemoveRaftVoterRequest.json

diff --git a/clients/src/main/resources/common/message/RemoveRaftVoterRequest.json b/clients/src/main/resources/common/message/RemoveRaftVoterRequest.json
index 7d11086e53..2181ecd9ff 100644
--- a/clients/src/main/resources/common/message/RemoveRaftVoterRequest.json
+++ b/clients/src/main/resources/common/message/RemoveRaftVoterRequest.json
@@ -18,14 +18,14 @@
   "type": "request",
   "listeners": ["controller", "broker"],
   "name": "RemoveRaftVoterRequest",
-  "validVersions": "0",
+  "validVersions": "0-1",
   "flexibleVersions": "0+",
   "fields": [

Proposed Changes

Server side changes

  • During  AddRaftVoterRequest handling, if api version >= 2,
    • the voter directory id is derived from in-memory LeaderState when the value is Uuid.ZERO_UUID,
    • the controller endpoints are derived from ClusterImage if endpoint set is empty, note that the ClusterImage may lag behind actual state, so endpoints are not strictly idempotent.
  • During AddRaftVoterRequest handing, if multiple observers share the same node ID, reject with IllegalStateException indicating the duplicate node ID and instruct the user to resolve the conflict.

  • During RemoveRaftVoterRequest handing, if api version >=1, the voter directory id is derived from in-memory LeaderState when the value is Uuid.ZERO_UUID.

Client side changes

  • Two convenience methods for adding and removing controllers have been introduced in Admin.java, addRaftVoter documented with Javadoc warnings about idempotency risks, and are intended for use only when the user understands and accepts those risks.

MetadataQuorumCommand add-controller changes

Add a new option —-controller-id to add-controller subcommand.

new --controller-id option
        addControllerParser
            .addArgument("--controller-id", "-i")
            .help("The id of the controller to add. This option should be used with bootstrap controller.")
            .type(Integer.class)
            .action(Arguments.store());
  • If —-controller-id is provided, invoke new method Admin#addRaftVoter(int)
  • If —-command-config and —-controller-id are both provided, the config file provided by —-command-config will only be applied in Admin client initialization.

    • the description for —-command-config will be changed to "Property file containing configs to be passed to Admin Client. For add-controller, the file is used to specify the controller properties as well unless --controller-id is provided."
  • If neither —-command-config  nor —-controller-id is provided, an exception will be thrown:

    • throw new TerseException("You must use --command-config or --controller-id option to add a controller.");

MetadataQuorumCommand remove-controller changes

Option controller-directory-id in remove-controller subcommand
diff --git a/tools/src/main/java/org/apache/kafka/tools/MetadataQuorumCommand.java b/tools/src/main/java/org/apache/kafka/tools/MetadataQuorumCommand.java
index dba7951aa4..f3bdbbeffa 100644
--- a/tools/src/main/java/org/apache/kafka/tools/MetadataQuorumCommand.java
+++ b/tools/src/main/java/org/apache/kafka/tools/MetadataQuorumCommand.java
@@ -471,7 +471,6 @@ public class MetadataQuorumCommand {
         removeControllerParser
             .addArgument("--controller-directory-id", "-d")
             .help("The directory ID of the controller to remove.")
-            .required(true)
             .action(Arguments.store());
  • The —-controller-directory-id is no longer required, we can leverage on the new method Admin#removeRaftVoter(int)

  • If —-controller-directory-id is explicitly provided, invoke Admin#removeRaftVoter(int, Uuid) 

Compatibility, Deprecation, and Migration Plan

This KIP introduces new methods in Admin.java and with directory uuid and endpoints fields optional for 2 RPCs with no breaking change. 

And the CLI changes are also backward compatible:

  • The —-command-config option remains available in add-controller.

  • The --controller-directory-id option in remove-controller is now optional but still supported.

Test Plan

New test cases will be added to MetadataQuorumCommandTest.java to validate:

  • Adding a controller with --controller-id.

  • Removing a controller without explicitly providing --controller-directory-id.

Integration tests will be added for the two new methods in Admin.java.

Rejected Alternatives

  • Deprecate —-command-config option in add-controller and --controller-directory-id option in remove-controller.

    The main reason not to deprecate these two parameters is that they were only just introduced in 4.0, so deprecating them in a 4.x release feels a bit too soon. Also, the --command-config can be used in a different user scenario, where the user can still provide the configuration file to add-controller if they already have it locally.

  • Using admin APIs to retrieve directory UUID and controller endpoints, but this brings extra network communication overhead.

    1. The Admin#describeMetadataQuorum method can provide the directory UUID.
    2. The Admin#describeConfigs method, utilizing the bootstrap.controller address, can be used to retrieve the necessary endpoints.

  • No labels