Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

architecture diagram

...


sequence diagram

...


core module sequence diagram

Image Added



 the communication messages between core and dispatcher

RPCDescRequestResponse
submitSqlThe core module submit record sql to dispatcher  SubmitRequest {
     String recordSql;
     Enum engine;  // Spark,Hive,Presto,etc.
     String owner;
     Integer maxRetryCount;
}
SubmitResponse{
     Integer code;
     String jobId;
     Enum errorCode;
     Exception ex;
}
getJobStatusThe core module get  the status of job from the dispatcher by jobid JobStatusRequest {
     String jobId;
}
JobStatusResponse{
     Integer code;
     Enum jobStatus;
     Enum errorCode;
     Exception ex;
}
getMetricResultThe core module will get the result of recordSql from  the dispatcher by jobid MetricRequest {
     String jobId;
}
MetricResponse{
     Integer code;
     Double metric;
     Enum errorCode;
     Exception ex;
}
validateSQLThe core module submit record sql to dispatcher for validating the syntax of record sql ValidateSQLRequest {
     String recordSql;
     Enum engine;  // Spark,Hive,Presto,etc.
}
ValidateSQLResponse{
     Integer code;
     Enum errorCode;
     Exception ex;
}




Code200The dispatcher accept the request and process success
400The request is rejected by the dispatcher, because it's a bad request
500The dispatcher accept the request but process failed
ErrorCode0recordSql syntax error
1internal error, dispatcher self is crashed
2 external error, target engine is crashed when dispatcher call,etc
JobStatus0ACCEPTED
1RUNNING
2SUCCESS
3FAILED

 the communication messages between the master node of core and the worker node of core


RPCDescRequestResponse Resp Desc
registDQWorkerNodeWhen DQ Worker node start, it will regist it self to master then the master can submit tasks to the worker nodeString hostNameint code200:

...

the master returns 200 and then it will assigned tasks to the node
other:regist failed, and the woker will retry regularly until the master returns 200
reportDQWorkNodeStatusThe worker node will report self status regularly, including the id list of running tasks, waitting tasks, success tasks, failed tasks, etc.List<Integer> runningIdList
List<Integer> waittingIdList
List<Integer> successIdList
List<Integer> failedIdList
int code200: the master returns 200 and then it will assign tasks to the node
submitDQTaskThe master can submit tasks to the worker nodeint instanceIdint code200: the worker accepts the task and then the task's status can be queried or be killed in this node
5XX: the task was rejected by the node because the error happened in the node server
4XX: the task was rejected by the node because the request is error, like find no task info by the instanceId, etc.
stopDQTaskThe master can stop tasks in the worker nodeint instanceIdint code200: the worker process the request success,
5XX: the request was rejected by the node because the error happened in the node server
4XX: the request was rejected by the node because the request is error, like find no task info in the node by the instanceId, etc.
querySingleDQTaskThe master can DQ Task status from the worker nodeint instanceIdint code
int status
200: the worker process the request success, then the value of status will descripe task's status
        0: init  1:  waitting 2: recording 3: evaluating 4: alerting 5: success 6: failed
5XX: the request was rejected by the node because the error happened in the node server
4XX: the request was rejected by the node because the request is error, like find no task info in the node by the instanceId, etc.
nodeHeartBeatThe heart beat message will be used to confirm the woker is alive, and then the task can be assigned to the nodelong timestampint code200: the master accepts it and updates the worker info success