DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Authors: Ivan Yurchenko, Anatolii Popov
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: Under Discussion
Discussion thread: here
JIRA: KAFKA-19448
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Kafka offers RemoteStorageManager interface as a way to let users plug a remote storage implementation. The broker also allows configuring the instance of the plugged RSM by forwarding to it the broker configuration prefixed with rsm.config. However, there’s neither a way for RemoteStorageManager to know the configuration of the topic with which segment it’s working, nor a way for the user to add custom topic configurations meaningful for a particular RSM. RemoteStorageManager implementations may benefit from this in situations where topic-specific tuning is needed e.g. for security (encryption enabled/disabled), data governance (the destination bucket or prefix), or performance.
Proposed Changes
This KIP proposes two changes that make sense together.
Allow setting custom configurations for topics
Currently, when a topic configuration is changed (or a topic is created), the controller validates the configuration provided. If it detects unknown configuration keys, the whole operation is rejected. The KIP proposes the following:
- Allow the topics to have any configuration prefixed with
unchecked. - Make controllers not validate these configurations and just store them as is along with the known configurations.
- Make the command line tools aware of this to be able to pass these configurations to the API.
We propose to use the unchecked. prefix instead of something more specific to RSM in order for this mechanism to be more generic and potentially usable by other features and tools.
Pass topic configurations to RemoteStorageManager methods
RemoteLogManager should pass the log configuration to RemoteStorageManager on each operation. This will allow a RSM implementation to benefit from the knowledge of the topic configurations, both known and custom (unchecked.) See Public Interfaces for the details of how the RemoteStorageManager needs to change.
Public Interfaces
There are three changes to the public interfaces. The first change is to pass LogConfig to the methods of RemoteStorageManager with the default implementation that calls the existing methods (to preserve compatibility with the existing implementations).
default Optional<CustomMetadata> copyLogSegmentData(RemoteLogSegmentMetadata remoteLogSegmentMetadata,
LogSegmentData logSegmentData,
LogConfig logConfig)
throws RemoteStorageException {
return copyLogSegmentData(remoteLogSegmentMetadata, logSegmentData);
}
default InputStream fetchLogSegment(RemoteLogSegmentMetadata remoteLogSegmentMetadata,
int startPosition,
LogConfig logConfig) throws RemoteStorageException {
return fetchLogSegment(remoteLogSegmentMetadata, startPosition);
}
default InputStream fetchLogSegment(RemoteLogSegmentMetadata remoteLogSegmentMetadata,
int startPosition,
int endPosition,
LogConfig logConfig) throws RemoteStorageException {
return fetchLogSegment(remoteLogSegmentMetadata, startPosition, endPosition);
}
default InputStream fetchIndex(RemoteLogSegmentMetadata remoteLogSegmentMetadata,
IndexType indexType,
LogConfig logConfig) throws RemoteStorageException {
return fetchIndex(remoteLogSegmentMetadata, indexType);
}
default void deleteLogSegmentData(RemoteLogSegmentMetadata remoteLogSegmentMetadata,
LogConfig logConfig) throws RemoteStorageException {
deleteLogSegmentData(remoteLogSegmentMetadata);
}
The second public interface change is to update kafka-configs.sh and kafka-topics.sh so that they can handle the unchecked. configurations.
The third change is to add the uncheckedConfigs to LogConfig, which will return the map of unchecked configs:
public Map<String, String> uncheckedConfigs() {
...
}
Compatibility, Deprecation, and Migration Plan
This feature does not introduce any incompatibility related to most of the Kafka ecosystem.
Existing RemoteStorageManager implementations will continue working due to the default implementations of the introduced methods.
The older versions of kafka-configs.sh and kafka-topics.sh are future-proof regarding this and just display unknown configurations as strings.
There may potentially exist third-party tools that break on an unknown topic configuration received from the API. However, these tools would be broken by adding any other topic configuration in a newer Kafka version, so the KIP doesn’t introduce a new incompatibility mode here.
Test Plan
The changes will be tested mostly on the unit level:
LogConfig,ControllerConfigurationValidatorTest,ConfigurationControlManagerTest,ReplicationControlManagerTest,TopicCommandTest, andConfigCommandTestmust be augmented to test unchecked configurations.RemoteLogManagerTestneeds to be updated according to the interface changes and it needs to validate the topic configuration is passed.
On the integration level:
BaseAdminIntegrationTestmust be augmented to create and update topics with unchecked configs.
There must be done some manual checks:
kafka-configs.shandkafka-topics.shmust be able to correctly set and display unchecked configs.- An implementation of
RemoteStorageManagercompiled against the current stable version of the interface must be able to successfully run in a changed broker.
Rejected Alternatives
Set per-topic custom configurations in RSM configuration
There is a workaround that could be currently used. The RSM implementation that needs per-topic configuration could expect them to be made in its own configuration, for example, by a combination of lists, regular expressions, and so on. This doesn’t provide known (i.e. standard Kafka) topic configurations to the RSM, but at least allows custom settings to be made. This alternative is rejected for the following reasons:
- It’s static, i.e. a broker restart is needed for it to be taken in use, which is disruptive for the cluster. This may be prohibitive in environments where separate teams own topic and broker configurations.
- It’s per-broker When a change is made (assuming a rolling restart is done), the window of configuration divergence is higher.
- The life cycle of these custom configurations is not tied to the actual topic life cycle, which is inconvenient and may even lead to unexpected behaviors, e.g. when a topic is deleted and recreated with the same name.