Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

 

Status

Current stateUnder Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Connect is a framework for running source connectors that load data in external systems into Kafka topics and sink connectors that consume data in Kafka and write to external systems. Users set up Kafka Connect and install and configuration connectors, while Kafka Connect manages the connector configurations, tracks the status of the connectors, and records the progress the connectors make via offsets, and when needed distributes and restarts the connectors using that persisted information.

...

Now that Kafka clients include administrative functionality (KIP-117, KAFKA-3265), it is possible for Kafka Connect to explicitly create its internal topics when they don’t already exist. Existing configuration properties specify the names of the topics, but additional information is necessary to control the replication factor and number of partitions for these internal topics. Sensible default value can be provided, but it is still likely that users will want to choose the number of partitions and replication factor that best satisfy their needs.

Public Interfaces

We will add several new Kafka Connect distributed worker configuration properties to specify the replication factor and number of partitions when Kafka Connect creates these internal storage topics.

Proposed Changes

The following configuration properties will be added:

...

Note that the replication factor may not be larger than the number of available Kafka brokers in the cluster. In such cases, Kafka Connect will be unable to create the topics and will fail with an error stating this limitation, and the user will need to explicitly set the aforementioned configurations.

Compatibility, Deprecation, and Migration Plan

These new configurations are used only when Kafka Connect needs to create its internal topics for storing configurations, offsets, and status. Users that are already running Kafka Connect will already have created such topics, and therefore they would have no need to explicitly set the configuration to non-default values. Users that are running new Kafka Connect distributed worker clusters may want to override the defaults in their new configuration files to reflect their own environments.

Rejected Alternatives

  1. Providing options to configure all of the available topic-specific configuration settings. Users can still manually create the topics using the kafka-topics.sh tool and specifying any of the topic-specific configuration settings.

  2. Having Kafka Connect compute the minimum replication factor based upon the desired replication factor and available number of brokers. This is unintuitive behavior that can lead to topics with insufficient replication factors that make it more likely to lose persisted information about connectors.

  3. Using a single configuration for replication factor of all internal topics. The name of such a property did not mirror the pattern already in place for the existing config.storage.topic, offset.storage.topic, and status.storage.topic configurations. Using explicit properties for each of the topics is also more straightforward.