Status
Current state: Closed – Approach does not require any public facing changes.
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA:
-
KAFKA-17431Getting issue details...
STATUS
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently, the SocketServer does not support invalid static configurations, as it will crash Kafka with an uncaught exception if the static configuration is invalid during server startup, even if previous dynamic reconfigurations make the configuration valid. It would be better to support invalid static SocketServer configurations so long as there were dynamically set changes that made them valid.
Public Interfaces
Based on the approach chosen below, enabling this functionality may require an additional file on disk.
Proposed Changes
There are two general approaches:
- Load dynamic configuration changes via the latest local metadata log snapshot and apply them before constructing the SocketServer (current preferred approach). Specifically, circumvent KRaft and metadata layers entirely and directly read the snapshot file records from disk.
- Circumventing KRaft layer: Read in the local snapshot file from disk without going through KRaft layer. In this approach, there are two cases: one where the broker requires a new snapshot from the controller during metadata catch up because its fetch offset is less than the controller's log start, and one where the broker does not require a new snapshot. When defaulting to loading in dynamic configurations from snapshot, the former case means two different snapshot reads occur during startup, and the latter means the first read will hopefully bring file blocks into the page cache so it's not as bad as the first (not really sure on this yet). If this approach is taken, this read should only occur when static configurations are invalid and reading them would crash Kafka.
Points of discussion on the snapshot approach:
- Does not have to deal with any cases where the file is not readable/writeable.
- Reuses state we already persist to disk.
- The server may not have the most up-to-date information depending on how old the snapshot is. One approach to remediate this is to have the broker take a snapshot while in controlled shutdown.
- When no snapshot exists or the latest snapshot does not contain dynamic configuration changes, use supplied static configurations.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Test Plan
Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
- Use initial metadata state retrieved from quorum: This approach is more straightforward and less hacky, but may run into some issues. For
BrokerServer
, if itsSocketServer
configurations are invalid in such a way which it is unable to connect to the controller quorum, then this approach cannot work. - Write SocketServer dynamic configuration changes to a file. Load this file and apply the changes before constructing the SocketServer.
- Some issues with this approach:
- Have to deal with cases where file is not readable/writeable. Should the dynamic configuration updates go through but potentially be lost if the broker crashes?