...
In the first version of the scheduler we don't intend to support local recovery. Adding support for it should be possible and we intend to add support for it as a follow up. One idea could be to extend the existing state machine by a new state "Restarting locally":
PlantUML |
---|
@startuml
hide empty description
[*] -> Created
Created --> Waiting : Start scheduling
state "Waiting for resources" as Waiting
state "Restarting globally" as RestartingG
state "Restarting locally" as RestartingL
Waiting --> Waiting : Resources are not stable yet
Waiting --> Executing : Resources are stable
Waiting --> Finished : Cancel or suspend
Executing --> Canceling : Cancel
Executing --> Failing : Unrecoverable fault
Executing --> Finished : Suspend or job reached terminal state
Executing --> RestartingG : Recoverable fault
RestartingG --> Finished : Suspend
RestartingG --> Canceling : Cancel
RestartingG --> Waiting : Cancelation complete
Canceling --> Finished : Cancelation complete
Failing --> Finished : Failing complete
Finished -> [*]
@enduml |
No support for local failovers
...