Child pages
  • KIP-258: Allow to Store Record Timestamps in RocksDB

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • null: no upgrade needed, run with latest formats
  • "in_place": prepare yourself for an upgrade of the rebalance user-metadata format and an in-place upgrade of RocksDB on-disk format
  • "roll_over": prepare yourself for an upgrade of the rebalance user-metadata format and an roll-over upgrade of RocksDB on-disk format

If we consider We also suggest to include a fix for https://issues.apache.org/jira/browse/KAFKA-6054 in this KIP, to allow upgrading from 0.10.0 to 1.2, we need to allow two more values for config upgrade.mode:

...

. We want to introduce a second parameter upgrade.from that can take the following values:

  • "0.10.0.x""roll_over_: for upgrading from 0.10.0.x "

...

  • to 1.2
  • "0.10.

...

  • 1.x

...

  • -1.

...

  • 1.x

...

  • ": for upgrading from

...

  • 0.10.1.x,..., 1.1.x

...

  • to 1.2 (we could also consider one value per version, but all would have the same semantics, so it seems to be better to have a single value that covers all)
  • we would add new version if required  for later releases

Note, that the above proposal only fixes KAFKA-6054 in 1.2. If we want to have fixes for versions 0.10.1.x, ...,1.1.x for KAFKA-6054, we would need to back port the configuration parameter back port only one of both configuration parameter, namely upgrade.modefrom into those older versions. For this case, upgrade.mode only from only needs to accept two values

...

As we only upgraded the rebalance metadata in 0.10.1.0, there is no RocksDB upgrade required and a single upgrade mode is sufficient for this case and thus parameter upgrade.mode is no required.

Upgrade fromUpgrade to

Set config upgrade.mode to

Set config upgrade.from toSide remark
0.10.0.x0.10.1.x, ..., 1.1.xN/A"0.10.0.x"
fixed KAFKA-6054 for releases 0.10.1.x, ..., 1.1.x
0.10.0.x1.2

"in_place_0.10.0.x"or "roll_over_"

"0.10.0.x"
fixed KAFKA-6054 for 1.2 release
0.10.1.x, ..., 1.1.x1.2"in_place" or "roll_over"does not fix KAFKA-6054"0.10.1.x-1.1.x"
 

Details about the different behavior for all types of upgrades are given below.

...

The main change we propose, is to change the value format within RocksDB to contain the record timestamp as a 8-byte (long) prefix; ie, we change the format from <key:value> to <key:timestamp+value>. We need to introduce a new value serde that wrapps the original value serde as well as a long serde. One important details is, that the serde only changes for the store, but not the changelog topic: the underlying changelog topic stores the timestamp in the record metadata timestamp field already.  In Thus, the new Serde wrapped would only be applied to read/write to RocksDB, but not for the changelog topic that uses the original Serde. In order to know the format of the store, we propose to encode the version of the format in the RocksDB directory name. We currently organize the state directory as follows:

...

  • Change the RocksDB on-disk format and encode the used serialization version per record (this would simplify future upgrades). However there are main disadvantages:
    • storage amplification for local stores
    • record version could get stored in record headers in changelog topics -> changelog topic might never overwrite record with older format
    • code needs to check all versions all the time for future release: increases code complexity and runtime overhead
    • it's hard to change the key format
      • for value format, the version number can be a magic prefix byte
      • for key lookup, we would need to know the magic byte in advance for efficient point queries into RocksDB; if multiple versions exist in parallel, this is difficult (either do multiple queries with different versions bytes until entry is found or all versions are tried implying does not exist – or use range queries but those are very expensive)
  • Encode the storage format in the directory name not at "store version number" but at "AK release number"
    • might be confusion to user if store format does not change ("I am running Kafka 1.4, but the store indicates it's running on 1.2").
  • use a simpler upgrade path without any configs or complex rolling bounce scenarios
    • requires application down-time for upgrading to new format
  • only support in-place upgrade path instead of two to simplify the process for users (don't need to pick)
    • might be prohibitive if not enough disk space is available
  • allow users to stay with old format: upgrade would be simpler as it's only one rolling bounce
    • unclear default behavior: should we stay on 1.1 format by default or should we use 1.2 format by default?
      • if 1.1 is default, upgrade is simple, but if one write a new application, users must turn on 1.2 format explicitly
      • if 1.2 is default, simple upgrade requires a config that tells Streams to stay with 1.1 format
      • conclusion: upgrading and not upgrading is not straight forward either way, thus, just force upgrade
    • if no upgrade happens, new features (as listed above) would be useless
  • Only prepare stores for active task (but not standby tasks)
    • this would reduce the disk footprint during upgrade
    • disadvantage: when switch to new version happens, there are not hot standby available for some time
    • we could make it configurable, however, we try to keep the number of configs small; also, it's already complex enough and adding more options would make it worse
    • it's not an issue for roll-over upgrade and not everybody configures Standbys in the frist place
      • people with standbys are willing to provide more disk, so it seem a fair assumption that they are fine with roll-over upgrade, too
  • Don't fix KAFKA-6054
    • it's a simple fix to include: just add two more accepted values to parameter upgrade.mode
    • it's s fair question, how many people will run with Streams 0.10.0 – note those, that if people are "stuck" with 0.10.0 broker, they can use 0.10.1 or newer as it's not backwards compatible to 0.10.0 – thus, might be more than expected
  • Fix KAFKA-6054 only for 1.2 release
    • it's a relativley simply fix for older releases (main desing work is already coverd and writing the code is not to complex becuase it's only the rebalance metadata version change)
    • upgrade path is also way simpler
    • it's unclear though if we will have bug-fix releases for older versions; thus nobody might ever be able to get this code (if they don't build from corresponding dev-branches themselves)

...