DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Motivation
In many cases the information about a job's state size is obtained from the latest checkpoint, this requires 2 REST calls to /jobs/:jobid/checkpoints getting the job's checkpointing statistics only to obtain the last completed or failed checkpoint id and then to /jobs/:jobid/checkpoints/details/:checkpointId to get details of checkpoint including task checkpoint data. This operation can be heavy as the endpoint /jobs/:jobid/checkpoints returns the overall job statistics which includes the complete checkpoints history, this affects performance of both REST server and even the client who deserializes the whole response only to get the latest id.
Proposed Changes
REST API changes
Add new /jobs/:jobid/checkpoints/details/latest to jobmanager's REST endpoints. We will also add 1 query parameter status to select latest completed or failed checkpoint or either.
options are /jobs/:jobid/checkpoints/details/latest?status=COMPLETED and /jobs/:jobid/checkpoints/details/latest?status=FAILED
We will not allow IN_PROGRESS whose details are expected to dynamically change as the checkpoint progresses to not fall for caching issues, plus it is counter intuitive to get the "latest in progress checkpoint".
This should require no request body.
The response body should exactly the same of the checkpoint details endpoint in the appendix
Response codes:
| Code | Description | response | Description |
|---|---|---|---|
| 200 | OK | latest checkpoint details | checkpoint details of checkpoint of highest Id matching search query |
| 404 | Not found | - | Job with job id parameter doesn't exist in the job manager |
| 404 | Not found | No checkpoints matching the search query were found |
Public Interfaces
We will add relevant classes for the new LatestCheckpointDetailsRestHandler and corresponding LatestCheckpointDetailsQueryMessageHeaders and LatestCheckpointDetailsQueryParameters
Implementation Details
we will also make use of the current CheckpointStatsCache to find and serve latest checkpoint.
Scope and limitations
This FLIP only adds a new REST endpoint, it doesn't add any feature to Flink dashboard UI
Compatibility, Deprecation, and Migration Plan
This is a new feature.
Test Plan
The feature will be covered by tests on different levels: REST API, UNIT test.
The feature should be tested manually as part of the release process.
Rejected Alternatives
N/A
Appendix
Existing Checkpoint Details Response
{
"type" : "object",
"id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:checkpoints:CheckpointStatistics",
"properties" : {
"alignment_buffered" : {
"type" : "integer"
},
"checkpoint_type" : {
"type" : "string",
"enum" : [ "CHECKPOINT", "UNALIGNED_CHECKPOINT", "SAVEPOINT", "SYNC_SAVEPOINT" ]
},
"checkpointed_size" : {
"type" : "integer"
},
"end_to_end_duration" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
},
"is_savepoint" : {
"type" : "boolean"
},
"latest_ack_timestamp" : {
"type" : "integer"
},
"num_acknowledged_subtasks" : {
"type" : "integer"
},
"num_subtasks" : {
"type" : "integer"
},
"persisted_data" : {
"type" : "integer"
},
"processed_data" : {
"type" : "integer"
},
"savepointFormat" : {
"type" : "string"
},
"state_size" : {
"type" : "integer"
},
"status" : {
"type" : "string",
"enum" : [ "IN_PROGRESS", "COMPLETED", "FAILED" ]
},
"tasks" : {
"type" : "object",
"additionalProperties" : {
"type" : "object",
"id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:checkpoints:TaskCheckpointStatistics",
"properties" : {
"alignment_buffered" : {
"type" : "integer"
},
"checkpointed_size" : {
"type" : "integer"
},
"end_to_end_duration" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
},
"latest_ack_timestamp" : {
"type" : "integer"
},
"num_acknowledged_subtasks" : {
"type" : "integer"
},
"num_subtasks" : {
"type" : "integer"
},
"persisted_data" : {
"type" : "integer"
},
"processed_data" : {
"type" : "integer"
},
"state_size" : {
"type" : "integer"
},
"status" : {
"type" : "string",
"enum" : [ "IN_PROGRESS", "COMPLETED", "FAILED" ]
}
}
}
},
"trigger_timestamp" : {
"type" : "integer"
}
}
}