Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. downstream loads upstream output buffers file and filters out what it has already processed; files are named deterministically and indexed
  2. same, but offset in the file is calculated based on buffer size
  3. downstream communicates its offset to upstream during snapshotting
  4. downstream communicates its offset to upstream during recovery

Existing persistence mechanisms vs custom

Depends on:

  1. ad-hoc or cont. spilling (cont. spilling can require special handling because of higher volumes)
  2. single-task channel processing (isn't compatible with current implementation)
  3. re-scaling support (now or in future)

Resolved questions

Retaining subtask indices between restarts

...

Tentatively chosen not to read upstream data in downstream because of efficiency (ready many small files) and easier implementation (2020-02-19).

Existing persistence mechanisms vs custom

Depends on:

  1. ad-hoc or cont. spilling (cont. spilling can require special handling because of higher volumes)
  2. single-task channel processing (isn't compatible with current implementation)
  3. re-scaling support (now or in future)

(assuming non-continuous spilling)

Conceptually, both channel state and operator state are part of a checkpoint.
So they must have the same: a) distribution, b) durability guarantees (and maybe other properties).

So it looks reasonable to reuse the existing mechanism of writing checkpointing data to a remote FS.
(CheckpointStreamWithResultProvider, CheckpointStreamFactory, etc.)

The returned handles need to be included in the task snapshot.

The disadvantage is that, if we don't rely on redistribution mechanisms for some reason (e.g. efficiency); it could be harder to implement the lookup. But this scenario seems to be unlikely.

(2020-02-20)

Compatibility, Deprecation, and Migration Plan

...