downstream loads upstream output buffers file and filters out what it has already processed; files are named deterministically and indexed
same, but offset in the file is calculated based on buffer size
downstream communicates its offset to upstream during snapshotting
downstream communicates its offset to upstream during recovery

Existing persistence mechanisms vs custom

Depends on:

ad-hoc or ~~cont. spilling~~ (cont. spilling can require special handling because of higher volumes)
~~single-task channel processing~~ (isn't compatible with current implementation)
re-scaling support (now or in future)

Resolved questions

Retaining subtask indices between restarts

...

Tentatively chosen not to read upstream data in downstream because of efficiency (ready many small files) and easier implementation (2020-02-19).

Existing persistence mechanisms vs custom

Depends on:

ad-hoc or ~~cont. spilling~~ (cont. spilling can require special handling because of higher volumes)
~~single-task channel processing~~ (isn't compatible with current implementation)
re-scaling support (now or in future)

(assuming non-continuous spilling)

Conceptually, both channel state and operator state are part of a checkpoint.
So they must have the same: a) distribution, b) durability guarantees (and maybe other properties).

So it looks reasonable to reuse the existing mechanism of writing checkpointing data to a remote FS.
(CheckpointStreamWithResultProvider, CheckpointStreamFactory, etc.)

The returned handles need to be included in the task snapshot.

The disadvantage is that, if we don't rely on redistribution mechanisms for some reason (e.g. efficiency); it could be harder to implement the lookup. But this scenario seems to be unlikely.

(2020-02-20)

Compatibility, Deprecation, and Migration Plan

...

Page tree

Versions Compared

Old Version 26

New Version 27

Key

Existing persistence mechanisms vs custom

Resolved questions

Retaining subtask indices between restarts

Existing persistence mechanisms vs custom

Compatibility, Deprecation, and Migration Plan

Page tree

Page History

Versions Compared

Old Version 26

New Version 27

Key

Existing persistence mechanisms vs custom

Resolved questions

Retaining subtask indices between restarts

Existing persistence mechanisms vs custom

Compatibility, Deprecation, and Migration Plan