...
After FLIP-34 we have introduced two different types for job stop:
Type | Source OPS | Task Status | Job Status |
---|---|---|---|
SUSPEND | Checkpoint Barrier, End Of Stream | Finished | Finished |
TERMINATE | MAX_WATERMARK, Checkpoint Barrier, End Of Stream | Finished | Finished |
And we need below implementations to support performing a checkpoint when stopping the job when (with retained checkpoint is configured):
- The Job Manager triggers a synchronous checkpoint at the source, that also indicates one of TERMINATE or SUSPEND
- Sources send a MAX_WATERMARK in case of TERMINATE, nothing is done in case of SUSPEND
- The Task Manager executes the checkpoint in a SYNCHRONOUS way, i.e. it blocks until the state is persisted successfully and the notifyCheckpointComplete() is executed.
- The Task Manager acknowledges the successful persistence of the state for the checkpoint
- The Job Manager sends the notification that the checkpoint is completed
- The Task Manger unblock the synchronous checkpoint execution.
- Finishing the job progress from the sources, i.e. they shut down and EOS message propagate through the job.
- The Job Manager waits until the job state goes to FINISHED before declaring the operation successful.
...