Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Authors: Henry Cai, Thomas Thornton

Status

Current state: Under Discussion Withdrawn

Discussion thread: here 

JIRA: KAFKA-19225 

...

A critical scenario to consider is the failure of an entire Availability Zone, which would take down both the leader broker and its co-located S3E1Z bucket. In this rare event, the cluster remains available by electing a new leader in a different AZ. Once a new leader (e.g., in AZ2) is elected, replica reconciliation takes place. This is similar to standard Kafka. If followers have more events than the high watermark, they will need to truncate them. Below is an example to illustrate this:

Suppose there is a remote WAL enabled topic with replication factor 3. The current leader of a partition is B1. B1 has received data up to offset 10, written data to S3E1Z up to offset 10, and published metadata up to offset 10. One follower, B2, has replicated up to offset 5, and follower B3, up to offset 7. Suppose B1 dies (e.g., AZ goes down), and B2 is elected as the new leader. B2 overwrites the LEO/high-watermark based on its own LEO, which is 5. B2 will tell its followers to reset their LEO to 5. So B3, which had an LEO of 7, will discard 2 messages and reset 5 to match B2, the new leader. In addition, B2 will also publish metadata DELETE events for any metadata entries it receives that are after its LEO (e.g., for messages 6-10), since they are no longer valid.

In the case that B3 was elected, with the higher end offset of offset 7. It will perform replica reconciliation and discard its data that is greater than the high watermark of 5. This prevents any sacrifices in availability, and is equivalent to standard Kafka.

Data Durability

In acks=all flow, we maintain the same data durability as today by requiring the data gets to the follower before we can acknowledge back to the producer;

...