Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion threadhttps://lists.apache.org/thread/ng543jls5mwwhq4s5zfdqhwbgq45vccz
Vote threadTBD
JIRA

FLINK-38120 - Getting issue details... STATUS

Release2.1

Motivation

In many cases the information about a job's state size is obtained from the latest checkpoint, this requires 2 REST calls to /jobs/:jobid/checkpoints getting the job's checkpointing statistics only to obtain the last completed or failed checkpoint id and then to /jobs/:jobid/checkpoints/details/:checkpointId to get details of checkpoint including task checkpoint data. This operation can be heavy as the endpoint  /jobs/:jobid/checkpoints returns the overall job statistics which includes the complete checkpoints history, this affects performance of both REST server and even the client who deserializes the whole response only to get the latest id.


Proposed Changes

REST API changes

Add new /jobs/:jobid/checkpoints/details/latest to jobmanager's REST endpoints. We will also add 1 query parameter status to select latest completed or failed checkpoint or either.

options are  /jobs/:jobid/checkpoints/details/latest?status=COMPLETED and  /jobs/:jobid/checkpoints/details/latest?status=FAILED

We will not allow IN_PROGRESS whose details are expected to dynamically change as the checkpoint progresses to not fall for caching issues, plus it is counter intuitive to get the "latest in progress checkpoint".


This should require no request body.


The response body should exactly the same of the checkpoint details endpoint in the appendix


Response codes:

CodeDescriptionresponseDescription
200OKlatest checkpoint details checkpoint details of checkpoint of highest Id matching search query
404Not found-Job with job id parameter doesn't exist in the job manager
404Not found
No checkpoints matching the search query were found

Public Interfaces

We will add relevant classes for the new LatestCheckpointDetailsRestHandler  and corresponding  LatestCheckpointDetailsQueryMessageHeaders  and LatestCheckpointDetailsQueryParameters 


Implementation Details

we will also make use of the current CheckpointStatsCache to find and serve latest checkpoint.

Scope and limitations

This FLIP only adds a new REST endpoint, it doesn't add any feature to Flink dashboard UI

Compatibility, Deprecation, and Migration Plan

This is a new feature.

Test Plan

The feature will be covered by tests on different levels: REST API, UNIT test.

The feature should be tested manually as part of the release process.

Rejected Alternatives

N/A

Appendix

Existing Checkpoint Details Response

{
  "type" : "object",
  "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:checkpoints:CheckpointStatistics",
  "properties" : {
    "alignment_buffered" : {
      "type" : "integer"
    },
    "checkpoint_type" : {
      "type" : "string",
      "enum" : [ "CHECKPOINT", "UNALIGNED_CHECKPOINT", "SAVEPOINT", "SYNC_SAVEPOINT" ]
    },
    "checkpointed_size" : {
      "type" : "integer"
    },
    "end_to_end_duration" : {
      "type" : "integer"
    },
    "id" : {
      "type" : "integer"
    },
    "is_savepoint" : {
      "type" : "boolean"
    },
    "latest_ack_timestamp" : {
      "type" : "integer"
    },
    "num_acknowledged_subtasks" : {
      "type" : "integer"
    },
    "num_subtasks" : {
      "type" : "integer"
    },
    "persisted_data" : {
      "type" : "integer"
    },
    "processed_data" : {
      "type" : "integer"
    },
    "savepointFormat" : {
      "type" : "string"
    },
    "state_size" : {
      "type" : "integer"
    },
    "status" : {
      "type" : "string",
      "enum" : [ "IN_PROGRESS", "COMPLETED", "FAILED" ]
    },
    "tasks" : {
      "type" : "object",
      "additionalProperties" : {
        "type" : "object",
        "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:checkpoints:TaskCheckpointStatistics",
        "properties" : {
          "alignment_buffered" : {
            "type" : "integer"
          },
          "checkpointed_size" : {
            "type" : "integer"
          },
          "end_to_end_duration" : {
            "type" : "integer"
          },
          "id" : {
            "type" : "integer"
          },
          "latest_ack_timestamp" : {
            "type" : "integer"
          },
          "num_acknowledged_subtasks" : {
            "type" : "integer"
          },
          "num_subtasks" : {
            "type" : "integer"
          },
          "persisted_data" : {
            "type" : "integer"
          },
          "processed_data" : {
            "type" : "integer"
          },
          "state_size" : {
            "type" : "integer"
          },
          "status" : {
            "type" : "string",
            "enum" : [ "IN_PROGRESS", "COMPLETED", "FAILED" ]
          }
        }
      }
    },
    "trigger_timestamp" : {
      "type" : "integer"
    }
  }
}
  • No labels