Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The change is backward compatible after KIP-31 and KIP-32 are checked in.

Broker will do the followings for log retention during migration:keep keep an in-memory maxTimestampSoFar variable, which is initialized to -1 and only gets updated when a message with a larger timestamp is appended to the log segment.

If maxTimestampSoFar is -1, log retention will still be based on last_modification_time. And log rolling will still be based on log create time.

Broker will do the followings during migration for log retention and searching by timestamp:

  1. The broker will create a time index for each segment if the segment does not have one.
    1. For the inactive log segments, the broker will append an entry
    The broker will rebuild the time based log index for each segment if the segment does not have a time index.
    1. If the message.format.version of a topic is before 0.10.0, the time index will only have one entry (last_modification_time_of_the_segment -> offset_of_the_first_message_in_the_segment)
    2. If the message.format.version of a topic is on 0.10.0, the broker will scan the messages in a log segment and rebuild the timestamp. If no message has a timestamp in the segment, the entry (last_modification_time_of_the_segment -> offset_of_the_first_message_in_the_segment) will be inserted to the log index. Otherwise  to each empty time index.
    3. For the active log segments, the time index file will be built in the normal wayleft empty.
  2. After the entire cluster is migrated to use time based log index for log retention. The broker will enforce log retention using time index. Given what we do in step 1, the behavior is:
    1. For segments only has having messages whose versions are before 0.10.0, the entry with last modification time in the time index will be used for retention.
    2. For segments has having at least one message whose with version is after 0.10.0, the max timestamp of the messages will be used for log retention.

...

  1. On startup, broker will initially use the segment last modification time as the max largest message timestamp.
  2. If a new message whose version is after 0.10.0 and its timestamp is greater than current max message timestamp. The broker updates the current max largest message timestamp.
  3. Broker always use the difference between current time and max message largest timestamp of the segment to decide whether roll out a new log segment or not.

...