Table of Contents

Status

Current state: Under Discussion Adopted

Discussion thread: here

JIRA: KAFKA-15045

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Finally, there are good reasons for a user to want to extend or modify the behaviour behavior of the Kafka Streams partition assignor beyond just changing the task assignment. For example, a user may want to implement their own initialization logic that initializes resources (much the same way the Streams Partition Assignor initializes internal topics).

...

Code Block

language	java
title	StreamsConfig

public static class InternalConfig {
        // This will be removed
        public static final String INTERNAL_TASK_ASSIGNOR_CLASS = "internal.task.assignor.class";
}

...

Code Block

language	java
title	TaskAssignor

package org.apache.kafka.streams.processor.assignment;

public interface TaskAssignor extends Configurable {    

   /**
 enum AssignmentError {    * 
	NONE,
	NONE: no error detected
     * ACTIVE_TASK_ASSIGNED_MULTIPLE_TIMES,:  multiple   
	ACTIVE_AND_STANDBY_TASK_ASSIGNED_TO_SAME_KAFKASTREAMS,
    UNKNOWN_PROCESS_ID
  }

  /**
KafkaStreams clients assigned with the same active task
     * @param applicationState the metadata for this Kafka Streams application
   *
INVALID_STANDBY_TASK: stateless task assigned as a standby task
     * @return the assignment of active and standby tasks to KafkaStreams clients 
   *
   * @throws TaskAssignmentException If an error occurs during assignment and you wish for the rebalance to be retried,
   *                                 you can throw this exception to keep the assignment unchanged and automatically
   *     MISSING_PROCESS_ID: ProcessId present in the input ApplicationState was not present in the output TaskAssignment
     * UNKNOWN_PROCESS_ID: unrecognized ProcessId not matching any of the participating consumers
     * UNKNOWN_TASK_ID: unrecognized TaskId not matching any of the tasks to be assigned
     */   
    enum AssignmentError {     
	    NONE,
	    ACTIVE_TASK_ASSIGNED_MULTIPLE_TIMES,     
	    INVALID_STANDBY_TASK,
        MISSING_PROCESS_ID,
        UNKNOWN_PROCESS_ID,
	    UNKNOWN_TASK_ID
  }

  /**
   * @param applicationState the metadata for this Kafka Streams application
 schedule an immediate*
 followup rebalance. 
* @return the */
assignment of TaskAssignmentactive assign(ApplicationState applicationState);

  /**
   * This callback can be used to observe the final assignment returned to the brokers and check for any errors that and standby tasks to KafkaStreams clients 
   *
   * @throws TaskAssignmentException If an error occurs during assignment and you wish for the rebalance to be retried,
   * were detected while processing the returned assignment. If any  errors were found, the corresponding 
   * will be returned and a StreamsException will be thrown after this callback returns. Theyou StreamsExceptioncan will
throw this exception *to bekeep thrownthe upassignment tounchanged killand theautomatically
 StreamThread and can* be handled as any other uncaught exception would if the application
   * has registered a {@link StreamsUncaughtExceptionHandler}.
   * 
   * @param assignment:   the final assignmentschedule returnedan toimmediate thefollowup kafkarebalance. broker
   */
 @param subscription: the original subscription passed into the assignor
   * @param error:        the corresponding error type if one wasTaskAssignment assign(ApplicationState applicationState);

  /**
   * This callback can be used to observe the final assignment returned to the brokers and check for any errors that 
   * were detected while processing the returned assignment. If any errors were found, the corresponding 
   *  will  be  returned  and  a  StreamsException  will  be  thrown  after this  orcallback AssignmentErrorreturns.NONE ifThe the returned assignment was validStreamsException will
   */
 be defaultthrown voidup onAssignmentComputed(GroupAssignment assignment, GroupSubscription subscription, AssignmentError error) {}

  /**
   * Wrapper class for the final assignment of active and standbys tasks to individual to kill the StreamThread and can be handled as any other uncaught exception would if the application
   * has registered a {@link StreamsUncaughtExceptionHandler}.
   * KafkaStreams clients<p>
   */
 Note: classsome TaskAssignment {

	/**
     * @return the assignment of tasks to kafka streams clients
     */
    public Collection<KafkaStreamsAssignment> assignment();
  }
}

Another reason for introducing the new TaskAssignment and ApplicationState classes is to clean up the way assignment is performed today, as the current API is really not fit for public consumption. Currently, the TaskAssignor is provided a set of ClientState objects representing each KafkaStreams client. The ClientState is however not just the input to the assignor, but also its output – the assignment of tasks to KafkaStreams clients is performed by mutating the ClientStates passed in. The return value of the #assign method is a simple boolean indicating to the StreamsPartitionAssignor whether it should request a followup probing rebalance, a feature associated only with the HighAvailabilityTaskAssignor.

To solve these problems, we plan to refactor the interface with two goals in mind:

To provide a clean separation of input/output by splitting the ClientState into an input-only KafkaStreamsState metadata class and an output-only KafkaStreamsAssignment return value class
To decouple the followup rebalance request from the probing rebalance feature and give the assignor more direct control over the followup rebalance schedule, by allowing it to indicate which KafkaStreams client(s) should trigger a rejoin and when to request the subsequent rebalance

This gives us the following two new top-level public interfaces, KafkaStreamsState and KafkaStreamsAssignment :

KafkaStreamsAssignment

Next we have the KafkaStreamsAssignment interface, representing the output of the assignment

Code Block

language	java
title	NodeAssignment

package org.apache.kafka.streams.processor.assignment; 

/**
 * A simple interface for the assignor to return the desired placement of active and standby tasks on KafkaStreams clients
  */
public interface KafkaStreamsAssignment {
  ProcessID processId();

  Set<AssignedTask> assignment();kinds of errors will make it impossible for the StreamsPartitionAssignor to parse the TaskAssignment
   * that was returned from the TaskAssignor's {@link #assign}. If this occurs, the {@link GroupAssignment} passed
   * in to this callback will contain an empty map instead of the consumer assignments.
   * 
   * @param assignment:   the final consumer assignments returned to the kafka broker, or an empty assignment map if
   *                      an error prevented the assignor from converting the TaskAssignment into a GroupAssignment
   * @param subscription: the original consumer subscriptions passed into the assignor
   * @param error:        the corresponding error type if one was detected while processing the returned assignment,  
   *                      or AssignmentError.NONE if the returned assignment was valid
   */
  default void onAssignmentComputed(GroupAssignment assignment, GroupSubscription subscription, AssignmentError error) {}

  /**
   * @returnWrapper theclass actualfor deadlinethe infinal objectiveassignment time,of afteractive whichand thestandbys followuptasks rebalanceto will beindividual attempted.
   * Equivalent to {@code 'now + followupRebalanceDelay'}KafkaStreams clients
   */
  Instant followupRebalanceDeadline();

  static class AssignedTaskTaskAssignment {

	/**
    public AssignedTask(final* TaskId@return id,the finalassignment Type taskType);

    enum Type {of tasks to kafka streams clients
        ACTIVE,
        STANDBY
    }
    */
    public Type type();

    public TaskId idCollection<KafkaStreamsAssignment> assignment();
  }
}

Read-only APIs

The following APIs are intended for users to read/use but do not need to be implemented in order to plug in a custom assignor

ProcessID

The ProcessId is a new wrapper class around the UUID to make things easier to understand:

Code Block

language	java
title	ProcessID

package org.apache.kafka.streams.processor.assignment; 

/** A simple wrapper around UUID that abstracts a Process ID */
public class ProcessID {

    public ProcessID(final UUID id) {
        this.id = id;
    }

    public id() {
        return id;
    }
}

KafkaStreamsState

...

Another reason for introducing the new TaskAssignment and ApplicationState classes is to clean up the way assignment is performed today, as the current API is really not fit for public consumption. Currently, the TaskAssignor is provided a set of ClientState objects representing each KafkaStreams client. The ClientState is however not just the input to the assignor, but also its output – the assignment of tasks to KafkaStreams clients is performed by mutating the ClientStates passed in. The return value of the #assign method is a simple boolean indicating to the StreamsPartitionAssignor whether it should request a followup probing rebalance, a feature associated only with the HighAvailabilityTaskAssignor.

To solve these problems, we plan to refactor the interface with two goals in mind:

To provide a clean separation of input/output by splitting the ClientState into an input-only KafkaStreamsState metadata class and an output-only KafkaStreamsAssignment return value class
To decouple the followup rebalance request from the probing rebalance feature and give the assignor more direct control over the followup rebalance schedule, by allowing it to indicate which KafkaStreams client(s) should trigger a rejoin and when to request the subsequent rebalance

This gives us the following two new top-level public interfaces, KafkaStreamsState and KafkaStreamsAssignment :

KafkaStreamsAssignment

Next we have the KafkaStreamsAssignment class, representing the output of the assignment to be created by the TaskAssignor:

Code Block

language	java
title	NodeStateKafkaStreamsAssignment

package org.apache.kafka.streams.processor.assignment; 

/**
 * A read-onlysimple metadatacontainer class representingfor the currentassignor stateto ofreturn eachthe KafkaStreamsdesired clientplacement withof atactive leastand onestandby StreamThreadtasks participatingon inKafkaStreams thisclients
 rebalance
 */
public interfaceclass KafkaStreamsStateKafkaStreamsAssignment {

  /** 
    * @returnConstruct thean processIdinstance of KafkaStreamsAssignment thewith applicationthis instanceprocessId runningand onthe thisgiven KafkaStreamsset clientof
    */
 assigned ProcessID processId();

  /**
   * Returns the number of processing threads available to work on tasks for this KafkaStreams client, tasks. If you want this KafkaStreams client to request a followup rebalance, you
   * can set the followupRebalanceDeadline via the {@link #withFollowupRebalance(Instant)} API.
   *
   which* represents@param itsprocessId overallthe capacityprocessId for workthe relativeKafkaStreams toclient otherthat KafkaStreamsshould clients.
receive this  *assignment
   * @param @returnassignment the numberset of tasks to processingbe threadsassigned onto this KafkaStreams client
   */
  int numProcessingThreads();

  /**
 @return a *new @returnKafkaStreamsAssignment theobject setwith ofthe consumergiven clientprocessId ids for this KafkaStreams clientand assignment
   */
  public static SortedSet<String>KafkaStreamsAssignment consumerClientIds(of(final ProcessId processId, final Set<AssignedTask> assignment);

   /**
   * @returnThis theAPI setcan ofbe allused activeto tasksrequest ownedthat bya consumersfollowup onrebalance thisbe KafkaStreamstriggered clientby sincethe theKafkaStreams previousclient rebalance
   */
 receiving SortedSet<TaskId> previousActiveTasks();

  /**
   * @return the set of all standby tasks owned by consumers on this KafkaStreams client since the previous rebalancethis assignment. The followup rebalance will be initiated after the provided deadline
   * has passed, although it will always wait until it has finished the current rebalance before 
   */
 triggering SortedSet<TaskId> previousStandbyTasks();

  /**
   * Returns the total lag across all logged stores in the task. Equal to the end offset sum if this client
   * did not have any state for this task on disk.
   *
   * @return end offset sum - offset sum
   *          Task.LATEST_OFFSET if this was previously an active running task on this client
   */
  long lagFor(final TaskId task);

  /**a new one. This request will last until the new rebalance, and will be erased if a
   * new rebalance begins before the scheduled followup rebalance deadline has elapsed. The next
   * assignment must request the followup rebalance again if it still wants to schedule one for
   * the given instant, otherwise no additional rebalance will be triggered after that.
   * 
   * @param rebalanceDeadline the instant after which this KafkaStreams client will trigger a followup rebalance
   *
   * @return thea previousnew tasksKafkaStreamsAssignment assignedobject towith thisthe consumersame orderedprocessId byand lag,assignment filteredbut forwith anythe tasks that don't exist in this assignmentgiven rebalanceDeadline
   */
  public SortedSet<TaskId>KafkaStreamsAssignment prevTasksByLagwithFollowupRebalance(final StringInstant consumerClientIdrebalanceDeadline);

  /**
public   * Returns a collection containing all (and only) stateful tasks in the topology by {@link TaskId},
   * mapped to its "offset lag sum". This is computed as the difference between the changelog end offsetProcessID processId();

  public Map<TaskId, AssignedTask> tasks();

  public void assignTask(AssignedTask);

  public void removeTask(AssignedTask);
 
  /**
   * @return the actual deadline in objective time, after which the followup rebalance will be attempted.
   * andEquivalent theto current offset, summed across all logged state stores in the task.
   *
   * @return a map from all stateful tasks to their lag sum
   */
  Map<TaskId, Long> statefulTasksToLagSums();

  /**
   * The {@link HostInfo} of this KafkaStreams client, if set via the
   * {@link org.apache.kafka.streams.StreamsConfig#APPLICATION_SERVER_CONFIG application.server} config
   *
   * @return the host info for this KafkaStreams client if configured, else {@code Optional.empty()}
   */
  Optional<HostInfo> hostInfo();

  /**
   * The client tags for this KafkaStreams client, if set any have been via configs using the
   * {@link org.apache.kafka.streams.StreamsConfig#clientTagPrefix}
   {@code 'now + followupRebalanceDelay'}
   */
  public Instant followupRebalanceDeadline();

  public static class AssignedTask {

    public AssignedTask(final TaskId id, final Type taskType);

    enum Type {
        ACTIVE,
        STANDBY
    }
    
    public Type type();

    public TaskId id();
  }
}

Read-only APIs

The following APIs are intended for users to read/use but do not need to be implemented in order to plug in a custom assignor

ProcessID

The ProcessId is a new wrapper class around the UUID to make things easier to understand:

Code Block

language	java
title	ProcessId

package org.apache.kafka.streams.processor.assignment; 

/** A simple wrapper around UUID that abstracts a Process Id */
public class ProcessId {

    public ProcessId(final UUID id) {
        this.id = id;
    }

    public id() {
        return id;
    }
}

KafkaStreamsState

Next we have the KafkaStreamsState interface, representing the input to the assignor:

Code Block

language	java
title	KafkaStreamsState

package* <p>
   * Can be used however you want, or passed in to enable the rack-aware standby task assignor.
   *
   * @return all the client tags found in this KafkaStreams client's {@link org.apache.kafka.streams.StreamsConfig}
   */
  Map<String, String> clientTags();
 }

ApplicationState

The KafkaStreamsState will be wrapped up along with the other inputs to the assignor (such as the configuration and set of tasks to be assigned, as well as various utilities that may be useful) in the next new interface, the ApplicationState . The methods on the ApplicationState are basically just the current inputs to the #assign method:

Code Block

language	java
title	ApplicationState

package org.apache.kafka.streams.processor.assignment;

/**
 * A read-only metadata class representing the current state of each KafkaStreams client with at least one StreamThread participating in this rebalance
 */
public interface ApplicationState {
    /**
     * @param computeTaskLags whether or not to include task lag information in the returned metadata. Note that passing 
     * in "true" will result in a remote call to fetch changelog topic end offsets and you should pass in "false" unless
     * you specifically need the task lag information.
     *
     * @return a map from the {@code processId} to {@link KafkaStreamsState} for all KafkaStreams clients in this app
     *
     * @throws TaskAssignmentException if a retriable error occurs while computing KafkaStreamsState metadata. Re-throw
     *                                 this exception to have Kafka Streams retry the rebalance by returning the same
  .processor.assignment;

/**
 * A read-only metadata class representing the current state of each KafkaStreams client with at least one StreamThread participating in this rebalance
 */
public interface KafkaStreamsState {
  /**
   * @return the processId of the application instance running on this KafkaStreams client
   */
  ProcessID processId();

  /**
   * Returns the number of processing threads available to work on tasks for this KafkaStreams client, 
   * which represents its overall capacity for work relative to other KafkaStreams clients.
   *
   * @return the number of processing threads on this KafkaStreams client
   */
  int numProcessingThreads();

  /**
   * @return the set of consumer client ids for this KafkaStreams client
   */
  SortedSet<String> consumerClientIds();

  /**
   * @return the set of all active tasks owned by consumers on this KafkaStreams client since the previous rebalance
   */
  SortedSet<TaskId> previousActiveTasks();

  /**
   * @return the set of all standby tasks owned by consumers on this KafkaStreams client since the previous rebalance
   */
  SortedSet<TaskId> previousStandbyTasks();

  /**
   * Returns the total lag across all logged stores in the task. Equal to the end offset sum if this client
   * did not have any state for this task on disk.
   *
   * @return end offset sum - offset sum
   *          Task.LATEST_OFFSET if this was previously an active running task on this client
   * @throws UnsupportedOperationException if the user did not request assignmenttask andlags scheduling an immediate followup rebalancebe computed.
       */
  long lagFor(final Map<ProcessID, KafkaStreamsState> kafkaStreamsStates(boolean computeTaskLagsTaskId task);

    /**
     * @return athe simpleprevious containertasks classassigned withto thethis Streamsconsumer configsordered relevantby tolag, assignment
filtered for any tasks  */
    AssignmentConfigs assignmentConfigs();

 that don't exist in this assignment
   /**
 @throws UnsupportedOperationException   * @returnif the setuser ofdid allnot tasksrequest in this topology which musttask lags be assignedcomputed.
      */
  SortedSet<TaskId> prevTasksByLag(final Set<TaskId>String allTasks(consumerClientId);

    /**
     *
 Returns a collection containing *all @return the set of stateful and changelogged(and only) stateful tasks in thisthe topology
 by {@link TaskId},
   */
 mapped to its "offset Set<TaskId> statefulTasks();

    /**
     *
  lag sum". This is computed as the difference between the changelog end offset
   * @returnand the current offset, setsummed ofacross statelessall orlogged changelog-lessstate tasksstores in thisthe topologytask.
     */
    Set<TaskId> statelessTasks* @return a map from all stateful tasks to their lag sum
   * @throws UnsupportedOperationException if the user did not request task lags be computed.
   */
  Map<TaskId, Long> statefulTasksToLagSums();

  /**
}

TaskAssignmentUtils

We'll also move some of the existing assignment functionality into a utils class that can be called by implementors of the new TaskAssignor . This will allow users to more easily adapt or modify pieces of the complex existing assignment algorithm, without having to re-implement the entire thing from scratch.

Code Block

language	java
title	ApplicationMetadata

package   * The {@link HostInfo} of this KafkaStreams client, if set via the
   * {@link org.apache.kafka.streams.processor.assignment;

/**
 * A set of utilities to help implement task assignment
 */
public final class TaskAssignmentUtils {
  StreamsConfig#APPLICATION_SERVER_CONFIG application.server} config
   *
   * @return the host info for this KafkaStreams client if configured, else {@code Optional.empty()}
   */
  Optional<HostInfo> hostInfo();

  /**
   * The *client Assigntags standbyfor tasks tothis KafkaStreams clientsclient, accordingif toset theany defaulthave logic.
been via configs using  * <p>the
     * If rack-aware client tags are configured, the rack-aware standby task assignor will be used
     *
     * @param applicationState        the metadata and other info describing the current application state
     * @param KafkaStreamsAssignments the current assignment of tasks to KafkaStreams clients
     *
  {@link org.apache.kafka.streams.StreamsConfig#clientTagPrefix}
   * <p>
   * Can be used however you want, or passed in to enable the rack-aware standby task assignor.
   *
   * @return all the client tags found in this KafkaStreams client's {@link org.apache.kafka.streams.StreamsConfig}
   */
  Map<String, String> clientTags();

  /**
   * @return athe newrackId mapfor containingthis theKafkaStreams mappingsclient, fromor KafkaStreamsAssignments updated with the default standby assignment
    {@link Optional#empty()} if none was configured
   */
  Optional<String> rackId();

  }

ApplicationState

The KafkaStreamsState will be wrapped up along with the other inputs to the assignor (such as the configuration and set of tasks to be assigned, as well as various utilities that may be useful) in the next new interface, the ApplicationState . The methods on the ApplicationState are basically just the current inputs to the #assign method:

Code Block

language	java
title	ApplicationState

package org.apache.kafka.streams.processor.assignment;

/**
 * A read-only metadata class representing the current state of each KafkaStreams client with at least one StreamThread participating in this rebalance
 */
public interface ApplicationState {
    /**
     * @param computeTaskLags whether or not to include task lag information in the returned metadata. Note that passing 
     * in "true" will result in a remote call to fetch changelog topic end offsets and you should pass in "false" unless
     * you specifically need the task lag information.
       public static Map<ProcessID, KafkaStreamsAssignment> defaultStandbyTaskAssignment(final ApplicationState applicationState, 
                                                                                      final Map<ProcessID, KafkaStreamsAssignment> KafkaStreamsAssignments);

    /**
     * Optimize @return a map from the active task assignment for rack-awareness {@code processId} to {@link KafkaStreamsState} for all KafkaStreams clients in this app
     *
     * @param@throws applicationStateTaskAssignmentException if a retriable error occurs while computing theKafkaStreamsState metadata. andRe-throw
 other info describing the current* application state
     * @param kafkaStreamsAssignments the current assignment of tasks to KafkaStreams clients
     *    @param tasks        this exception to have Kafka Streams retry the rebalance by returning the same
 set of tasks to reassign* if possible. Must already be assigned to a KafkaStreams client
     *
     * @return a new map containing the mappings from KafkaStreamsAssignments updated with the default rack-aware assignment for active tasks
      and scheduling an immediate followup rebalance
     */
    public static Map<ProcessID, KafkaStreamsAssignment>KafkaStreamsState> optimizeRackAwareActiveTaskskafkaStreamsStates(final ApplicationState applicationState, boolean computeTaskLags);

    /**
     * @return a simple container class with the Streams configs relevant to assignment
     */
    AssignmentConfigs assignmentConfigs();

    /**
     * @return a map of task ids to all tasks in this topology to be assigned
     */
                          final Map<ProcessID, KafkaStreamsAssignment> kafkaStreamsAssignments, 
                                                                                      final SortedSet<TaskId> tasks);

    /**
     * Optimize the standby task assignment for rack-awareness
     *
     * @param KafkaStreamsAssignments the current assignment of tasks to KafkaStreams clients
     * @param applicationState        the metadata and other info describing the current application state
     *
     * @return a new map containing the mappings from KafkaStreamsAssignments updated with the default rack-aware assignment for standy tasks
     */
    public static Map<ProcessID, KafkaStreamsAssignment> optimizeRackAwareStandbyTasks(final ApplicationState applicationState,
                             Map<TaskId, TaskInfo> allTasks();

}

TaskInfo

A small interface with metadata for each task to be assigned will be used to pass along information about stateful vs stateless tasks, the mapping of input and changelog topic partitions to tasks, and other essential info such as the rack ids for each topic partition belonging to a given task.

Code Block

language	java
title	TaskInfo

/**
 * A simple container class corresponding to a given {@link TaskId}.
 * Includes metadata such as whether it's stateful and the names of all state stores
 * belonging to this task, the set of input topic partitions and changelog topic partitions
 * for all logged state stores, and the rack ids of all replicas of each topic partition
 * in the task.
 */
public interface TaskInfo {

    TaskId id();

    boolean isStateful();

    Set<String> stateStoreNames();

	Set<TaskTopicPartition> topicPartitions();      
}

TaskTopicPartition

Another basic metadata container, this indicates whether the partition belongs to a source topic or a changelog topic (or in the case of a source-changelog topic, both) as well the rack ids of replicas hosting this partition, if available:

Code Block

language	java
title	TaskTopicPartition

package org.apache.kafka.streams.processor.assignment;
 
/**
 * This is a simple container class used during the assignment process to distinguish
 * TopicPartitions type. Since the assignment logic can depend on the type of topic we're
 * looking at, and the rack information of the partition, this container class should have
 * everything necessary to make informed task assignment decisions.
 */
public interface TaskTopicPartition {
    /**
     *
     * @return the {@code TopicPartition} for this task.
     */
    TopicPartition topicPartition();

    /**
     *
     * @return whether the underlying topic is a source topic or not. Source changelog topics
     *         are both source topics and changelog topics.
     */
    boolean isSource();

    /**
     *
     * @return whether the underlying topic is a changelog topic or not. Source changelog topics
     final* Map<ProcessID, KafkaStreamsAssignment> kafkaStreamsAssignments);

    /**
    are *both Returnsource a "no-op" assignment that just copies the previous assignment of tasks to KafkaStreams clientstopics and changelog topics.
     */
    boolean isChangelog();

     /**
     *
 @param applicationState the metadata and other info describing* @return the currentbroker applicationrack state
ids on which this topic partition resides. If no information *could
       * @return a new map containing an assignment that replicatesbe exactlyfound, thethis previouswill assignmentreturn reportedan inempty theoptional applicationStatevalue.
       */
    public static Map<ProcessID, KafkaStreamsAssignment> identityAssignment(final ApplicationState applicationStateOptional<Set<String>> rackIds();
 }

TaskAssignmentUtils

...

We'll also move some of the existing assignment functionality into a utils class that can be called by implementors of the new TaskAssignor . This will allow users to more easily adapt or modify pieces of the complex existing assignment algorithm, without having to re-implement the entire thing from scratch.

Code Block

language	java
title	TaskAssignmentUtils

package org.apache.kafka.streams.processor.assignment;

/**
 * A set of utilities to help implement task assignment
 */
public final class TaskAssignmentUtils {
    /**
     * Assign standby tasks to KafkaStreams clients according to the default logic.
     * <p>
     * If rack-aware client tags are configured, the rack-aware standby task assignor will be used
     *
     * @param applicationState        the metadata and other info describing the current application state
     * @param KafkaStreamsAssignments the KafkaStreams client assignments to add standby tasks to
     */
    public static void defaultStandbyTaskAssignment(final ApplicationState applicationState, 
                                                    final Map<ProcessId, KafkaStreamsAssignment> KafkaStreamsAssignments);

    /**
     * Optimize active task assignment for rack awareness. This optimization is based on the 
     * {@link StreamsConfig#RACK_AWARE_ASSIGNMENT_TRAFFIC_COST_CONFIG trafficCost} 
     * and {@link StreamsConfig#RACK_AWARE_ASSIGNMENT_NON_OVERLAP_COST_CONFIG nonOverlapCost}
     * configs which balance cross rack traffic minimization and task movement.
     * Setting {@code trafficCost} to a larger number reduces the overall cross rack traffic of the resulting 
     * assignment, but can increase the number of tasks shuffled around between clients. 
     * Setting {@code nonOverlapCost} to a larger number increases the affinity of tasks to their intended client
     * and reduces the amount by which the rack-aware optimization can shuffle tasks around, at the cost of higher
     * cross-rack traffic.
     * In an extreme case, if we set {@code nonOverlapCost} to 0 and @{code trafficCost} to a positive value,
     * the resulting assignment will have an absolute minimum of cross rack traffic. If we set {@code trafficCost} to 0,
     * and {@code nonOverlapCost} to a positive value, the resulting assignment will be identical to the input assignment.    
     * <p>
     * This method optimizes cross-rack traffic for active tasks only. For standby task optimization,
     * use {@link #optimizeRackAwareStandbyTasks}.
     * <p>
     * It is recommended to run this optimization before assigning any standby tasks, especially if you have configured
     * your KafkaStreams clients with assignment tags via the rack.aware.assignment.tags config since this method may
     * shuffle around active tasks without considering the client tags and can result in a violation of the original
     * client tag assignment's constraints.
     *
     * @param kafkaStreamsAssignments the assignment of tasks to KafkaStreams clients to be optimized
     * @param optimizationParams      optional configuration parameters to apply 
     */
    public static void optimizeRackAwareActiveTasks(final Map<ProcessId, KafkaStreamsAssignment> kafkaStreamsAssignments,
                                                    final RackAwareOptimizationParams optimizationParams);      

    /**
     * Optimize standby task assignment for rack awareness. This optimization is based on the 
     * {@link StreamsConfig#RACK_AWARE_ASSIGNMENT_TRAFFIC_COST_CONFIG trafficCost} 
     * and {@link StreamsConfig#RACK_AWARE_ASSIGNMENT_NON_OVERLAP_COST_CONFIG nonOverlapCost}
     * configs which balance cross rack traffic minimization and task movement.
     * Setting {@code trafficCost} to a larger number reduces the overall cross rack traffic of the resulting 
     * assignment, but can increase the number of tasks shuffled around between clients. 
     * Setting {@code nonOverlapCost} to a larger number increases the affinity of tasks to their intended client
     * and reduces the amount by which the rack-aware optimization can shuffle tasks around, at the cost of higher
     * cross-rack traffic.
     * In an extreme case, if we set {@code nonOverlapCost} to 0 and @{code trafficCost} to a positive value,
     * the resulting assignment will have an absolute minimum of cross rack traffic. If we set {@code trafficCost} to 0,
     * and {@code nonOverlapCost} to a positive value, the resulting assignment will be identical to the input assignment.
     * <p>
     * This method optimizes cross-rack traffic for standby tasks only. For active task optimization,
     * use {@link #optimizeRackAwareActiveTasks}.
     * 
     * @param KafkaStreamsAssignments the current assignment of tasks to KafkaStreams clients
     * @param optimizationParams      optional configuration parameters to apply 
     */
    public static void optimizeRackAwareStandbyTasks(final Map<ProcessId, KafkaStreamsAssignment> kafkaStreamsAssignments,
                                                     final RackAwareOptimizationParams optimizationParams);

    /**
     * Return a "no-op" assignment that just copies the previous assignment of tasks to KafkaStreams clients
     *
     * @param applicationState the metadata and other info describing the current application state
     *
     * @return a new map containing an assignment that replicates exactly the previous assignment reported in the applicationState
     */
    public static Map<ProcessId, KafkaStreamsAssignment> identityAssignment(final ApplicationState applicationState);

    /**
     * Validate the passed-in {@link TaskAssignment} and return an {@link AssignmentError} representing the
     * first error detected in the assignment, or {@link AssignorError.NONE} if the assignment passes the
     * verification check.
     * <p>
     * Note: this verification is performed automatically by the StreamsPartitionAssignor on the assignment
     * returned by the TaskAssignor, and the error returned to the assignor via the {@link TaskAssignor#onAssignmentComputed}
     * callback. Therefore, it is not required to call this manually from the {@link TaskAssignor#assign} method.
     * However if an invalid assignment is returned it will fail the rebalance and kill the thread, so it may be useful to
     * utilize this method in an assignor to verify the assignment before returning it and fix any errors it finds.
     *
     * @param applicationState The application for which this task assignment is being assessed.
     * @param taskAssignment   The task assignment that will be validated.
     *
     * @return {@code AssignmentError.NONE} if the assignment created for this application is valid,
     *         or another {@code AssignmentError} otherwise.
     */
    public static AssignmentError validateTaskAssignment(final ApplicationState applicationState,
                                                         final TaskAssignment taskAssignment) {
  }

TaskAssignmentUtils provides new APIs but pre-existing functionality, essentially presenting a clean way for users to take advantage of the current optimizations and algorithms that are utilized by the built-in assignors, so that users don't have to re-implement complex features such as rack-awareness. The #defaultStandbyTaskAssignment API will just delegate to the appropriate standby task assignor (either basic default or client tag based standby rack awareness, depending on the existence of client tags in the configuration). Similarly, the #optimizeRackAware{Active/Standby}Tasks API will just delegate to the new RackAwareTaskAssignor that is being added in KIP-925.

RackAwareOptimizationParams

A simple config container for necessary paramaters and optional overrides to apply when running the active or standby task rack-aware optimizations.

Code Block

language	java
title	RackAwareOptimizationParams

public static final class RackAwareOptimizationParams {
    private final ApplicationState applicationState;
    private final Optional<Integer> trafficCostOverride;
    private final Optional<Integer> nonOverlapCostOverride;
    private final Optional<SortedSet<TaskId>> tasksToOptimize;


    /**       
     * Return a new config object with no overrides and the tasksToOptimize initialized to the set of all tasks in the given ApplicationState       
     */        
     public static RackAwareOptimizationParams of(final ApplicationState applicationState);        

    /**       
     * Return a new config object with the tasksToOptimize set to all stateful tasks in the given ApplicationState       
     */  
     public RackAwareOptimizationParams forStatefulTasks();  
     
    /**
     * Return a new config object with the tasksToOptimize set to all stateless tasks in the given ApplicationState
     */  
    public RackAwareOptimizationParams forStatelessTasks();

    /**
     * Return a new config object with the provided tasksToOptimize
     */ 
     public RackAwareOptimizationParams forTasks(final SortedSet<TaskId> tasksToOptimize);

    /**
     * Return a new config object with the provided trafficCost override applied
     */ 
    public RackAwareOptimizationParams withTrafficCostOverride(final int trafficCostOverride);

    /**
     * Return a new config object with the provided nonOverlapCost override applied
     */
    public RackAwareOptimizationParams withNonOverlapCostOverride(final int nonOverlapCostOverride);
}

AssignmentConfigs

Last, we have the AssignmentConfigs, which are (and would remain) just a basic container class, although we will migrate from public fields to standard getters for each of the configs passed into the assignor. Going forward, when a KIP is proposed to introduce a new config intended for the assignor, it should include the appropriate getter(s) in this class as part of the accepted proposal.

Code Block

language	java
title	AssignmentConfigs

package org.apache.kafka.streams.processor.assignment;

public class AssignmentConfigs {
    public long acceptableRecoveryLag();
    public int maxWarmupReplicas();
    public int numStandbyReplicas();
    public long probingRebalanceIntervalMs();
    public List<String> rackAwareAssignmentTags();
    public intOptionalInt trafficCost();
    public intOptionalInt nonOverlapCost();
    public String rackAwareAssignmentStrategy();
 }

Finally, as part of this change, we're moving some of the behavior that can fail into the task assignor. In particular, we're moving the bits that compute lags for stateful tasks into the implementation of ApplicationState#kafkaStreamsStates . Users who request the task lags via the computeTaskLags input flag should make sure to handle failures the way they desire, and can rethrow a thrown TaskAssignmentException (or just not catch it in the first place) to have Kafka Streams automatically "retry" the rebalance by returning the same assignment and scheduling an immediate followup rebalance. Advanced users who want more control over the "fallback" assignment and/or the timing of immediate followup rebalance(s) can simply swallow the TaskAssignmentException and use the followupRebalanceDeadline to schedule followup rebalances, eg to implement a retry/backoff policy

...

As noted in the TaskAssignor javadocs, the StreamsPartitionAssignor will verify the assignment returned by the task assignor and return an error via #onAssignmentComputed if any of the following cases are observed while processing the TaskAssignor 's assignment:

ACTIVE_TASK_ASSIGNED_MULTIPLE_TIMES : multiple KafkaStreams clients assigned with the same active task
OVERLAPPINGINVALID_STANDBY_TASK: active stateless task and standby task assigned to the same KafkaStreams clientassigned as a standby task
MISSING_PROCESS_ID: ProcessId present in the input ApplicationState was not present in the output TaskAssignment
UNKNOWN_PROCESS_ID : unrecognized ProcessId not matching any of the participating consumers
UNKNOWN_TASKUNKNOWN_PROCESS_ID: unrecognized ProcessId TaskId not matching any of the participating consumerstasks to be assigned

If any of these errors are detected, the StreamsPartitionAssignor will immediately "fail" the rebalance and retry it by scheduling an immediate followup rebalance. If this occurs, the input assignment will be used as the new assignment, and the corresponding error will be returned from the #onAssignmentComputed APIthrow an exception after returning the error code via the #onAssignmentComputed callback. This error will be bubbled up through the StreamThread to the uncaught exception handler where the user can choose how to react from there, same as any other exception.

If no error is detected, the AssignmentError code NONE will be returned in the #onAssignmentComputed callback.

Consumer Assignments

One major decision in this KIP was whether to encompass the assignment of tasks to consumers/threads within each KafkaStreams client, or to leave that up to the StreamsPartitionAssignor and only carve out the KafkaStreams-level assignment for pluggability. Ultimately we decided on the latter, for several reasons:

...

Space shortcuts

Child pages

Versions Compared

Old Version 17

New Version Current

Key

Status

KafkaStreamsAssignment

Read-only APIs

ProcessID

KafkaStreamsState

KafkaStreamsAssignment

Read-only APIs

ProcessID

KafkaStreamsState

ApplicationState

TaskAssignmentUtils

ApplicationState

TaskInfo

TaskTopicPartition

TaskAssignmentUtils

RackAwareOptimizationParams

AssignmentConfigs

Consumer Assignments

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 17

New Version Current

Key

Status

KafkaStreamsAssignment

Read-only APIs

ProcessID

KafkaStreamsState

KafkaStreamsAssignment

Read-only APIs

ProcessID

KafkaStreamsState

ApplicationState

TaskAssignmentUtils

ApplicationState

TaskInfo

TaskTopicPartition

TaskAssignmentUtils

RackAwareOptimizationParams

AssignmentConfigs

Consumer Assignments