DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Current state: Approved
Discussion thread: https://lists.apache.org/thread/mb98kw1qjq2hb0ksj14d3thz3g50x9ck
Vote thread: https://lists.apache.org/thread/2vlxrnq0ojytw590fhs98f3g2hrwybbl
JIRA:
KAFKA-19648
-
Getting issue details...
STATUS
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
KIP-853: KRaft Controller Membership Changes added support for bootstrapping the KRaft state while KAFKA-13830 - Getting issue details... STATUS added support for bootstrapping the metadata state. In other words, the zero checkpoint (00000000000000000000-0000000000.checkpoint) contains the starting state for KRaft while bootstrap.checkpoint contains the starting state for the cluster metadata.
This KIP unifies these two checkpoints by moving the starting metadata from bootstrap.checkpoint to the zero checkpoint. The main advantage of using the zero checkpoint is that it integrates with the rest of the checkpoint mechanisms like checkpoint loading (RaftClient.Listener#handleLoadSnapshot) and checkpoint deletion introduced in KIP-630: Kafka Raft Snapshot. For example, not deleting the bootstrap.checkpoint has cause issues with Kafka startup logic as documented in KAFKA-19191 - Getting issue details... STATUS .
Currently, these two checkpoints that handle bootstrapping metadata can be viewed as "logically separate," and this KIP seeks to unify them under the zero checkpoint in KRaft.
- Pre-KIP-1170, the zero checkpoint only contains KRaft control records that follow the semantics of KIP-630: Kafka Raft Snapshot (The control records
SnapshotHeaderRecordandSnapshotFooterRecordare not a concept in the bootstrap.checkpoint). Other KRaft control records include thekraft.versionlevel and the starting voter set. - Pre-KIP-1170, the bootstrap checkpoint can contain feature level mappings (i.e. metadata.version=24,transaction.version=1,etc.) and SCRAM credentials, and this mapping on disk is read by the
QuorumControllerwhen it becomes leader. The activeQuorumControllerattempts to write the bootstrap metadata records in a transaction if transactions are supported, or in a single atomic batch. From the perspective of KRaft, the contents of the bootstrap.checkpoint are "data" records, not control records.
Public Interfaces
bootstrap.checkpoint
The bootstrap.checkpoint file is created by the kafka-storage tool. The kafka-storage tool will not create this file any more and will instead write metadata record to the zero checkpoint in the cluster metadata partition.
The Kafka node will delete the bootstrap checkpoint if it is not needed. The checkpoint is not needed if the bootstrapping metadata has been committed to the cluster metadata partition. One implementation is for the node to delete the bootstrap checkpoint if it has loaded a non-empty checkpoint from the cluster metadata partition.
Pre-KIP-1170, both brokers and controllers would create this file during formatting. Post-KIP-1170, this file will no longer be written during formatting.
00000000000000000000-0000000000.checkpoint
This checkpoint file is created by the kafka-storage tool in the __cluster-metadata-0 directory. Now, it will also contain the data that used to be contained in the bootstrap.checkpoint file. It is important to note that the bootstrap records must be less than 8MB of total size if transactions are not supported, since that is the maximum batch size in bytes supported by KRaft.
Pre-KIP-1170, this file was created by KIP-853-enabled controllers who formatted with either --standalone or --initial-controllers . Post-KIP-1170, this file is created during formatting by controllers. Brokers will not write this file because they cannot write its contents to the metadata log.
RPC
ApiVersionsResponse
There are two fields in the ApiVersionsResponse that are used to describe the finalized feature version of the cluster. There is an existing issue where ApiVersionsResponse reports finalized feature versions even if the state of the cluster metadata partition is not known. Those two fields are documented as follow:
FinalizedFeaturesEpoch - The monotonically increasing epoch for the finalized features information. Valid values are >= 0. A value of -1 is special and represents unknown epoch.
FinalizedFeatures - List of cluster-wide finalized features. The information is valid only if FinalizedFeaturesEpoch >= 0.
The semantic of FinalizedFeatures will be changed slightly to not include any finalized feature version if the FinalizedFeaturesEpoch is -1.
Proposed Changes
Controller
When the controller handles RaftClient.Listener#handleLoadSnapshot if the checkpoint id has an epoch of 0 and a base offset of 0, the controller will consider these records as the bootstrapping records. The controller will rewrite bootstrap records to the log if they haven't been successfully written in the past. If the controller doesn't load a checkpoint at epoch 0 and offset 0, the controller will load the bootstrap.checkpoint and rewrite the bootstrapping record to the cluster metadata partition if they haven't been successfully written in the past.
Node
Both the Kafka broker and controller nodes will delete the bootstrap.checkpoint file from the metadata log dir if the node has been asked to load a snapshot that contains at least one metadata record.
Kafka Storage Format Command
The kafka-storage format command is responsible for creating the zero checkpoint (00000000000000000000-0000000000.checkpoint). The metadata that is currently written to the bootstrap.checkpoint will instead be written to the zero checkpoint.
Formatting will fail if the metadata log directory already contains the bootstrap.checkpoint or the zero checkpoint. Formatting will be skip if the metadata log dir already contains the bootstrap.checkpoint or the zero checkpoint, and the --ignore-formatted flag is provided.
Compatibility, Deprecation, and Migration Plan
To be compatible with previous bootstrapping of Kafka, at controller activation the controller can be in the following states:
- bootstrap.checkpoint exist with metadata records and the zero checkpoint doesn't exists - In this case the controller will behave as it does today. The controller will be able to identify this case because RaftClient.Listener#handleLoadSnapshot won't ask to load the zero checkpoint.
- bootstrap.checkpoint exist with metadata records and the zero checkpoint exists but doesn't contain any metadata records - In this case the controller will behave as it does today. The controller will be able to identify this case because RaftClient.Listener#handleLoadSnapshot will ask to load the zero checkpoint but it will be empty, no metadata records.
- bootstrap.checkpoint doesn't exist and the zero checkpoint exists with metadata records - In this case the controller will use the zero checkpoint's metadata records to and write them to the log in a transaction or single atomic batch like the controller does pre-KIP-1170.
- bootstrap.checkpoint exist and the zero checkpoint exists with metadata records - This should not be possible from a formatting point of view but the active controller will handle this case the same as bullet 3 but with the addition of writing a WARN message to the controller log.
Test Plan
This feature will be tested using Java JUnit tests and system tests.
Rejected Alternatives
Not applicable.