Background
As we will move to a new data quality workflow, we also need to update related scheduler to meet workflow's needs as below
- time-based schedule capability, as create job(workflow) to be run at a specified time or intervals.
- manage jobs lifecycle such as init, start, pause/resume, stopped, killed, failed, success, etc.
- scheduler should retry some failed jobs to reduce failure ratio.
- coordinate recording/evaluating/alerting tasks within job(workflow).
- query the status of a job (running, failed, finished, etc), and its execution histories
- prioritize a job.
- high available, so that one scheduler node is down, jobs(including half-done jobs) can failover to another node.
- elastic, or work load balanced between works.
- except time-based trigger, we also need to support event-based-trigger.