DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
This page is meant as a template for writing a FLIP. To create a FLIP choose Tools->Copy on this page and modify with your content and replace the heading with the next FLIP number and a description of your issue. Replace anything in italics with your own description.
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).Motivation
Currently, the Adaptive Scheduler already supports the REST API in FLIP-291: Externalized Declarative Resource Management to manually adjust the parallelism of jobs, which enhances the functionality of the Adaptive Scheduler. Adaptive Scheduler will support record and query the rescale history in FLIP-495: Support AdaptiveScheduler record and query the rescale history. This makes it inconvenient for users/devs to quickly view some internal information about the rescale history of the Adaptive Scheduler.
So showing the history of rescale events of AdaptiveScheduler in the web UI is very useful for users to make the next step for jobs.
- Facilitate users to trace the history of rescale and make rescale information more transparent
- By REST APIs
- By Flink WebUI pages.
- Provide users with information on optimizing Adaptive Scheduler parameters
Proposed Changes & Public Interfaces
Based on the feature provided by FLIP-495: Support AdaptiveScheduler record and query the rescale history, We can design the following pages to display some rescale information.
The Web UI entry-point page location and showing logic
- Add a new tab page like exceptions to show it. Only for AdaptiveScheduler(& streaming job.).
- When should the subpage be shown ?
- Ans: The status of the job is ‘RUNNING’ and the SchedulerType of the job is AdaptiveScheduler and the job type of the job is STREAMING.
- How to obtain the info to judge?
- When users view Running Jobs or Overview or Completed pages, the interface /jobs/overview will be visited and response with the information(scheduler_type and job_type) that is used by the condition judgement.
- When should the subpage be shown ?
How to judge the case to show the sub-pages
Change /jobs/overview REST interface response body to supply the job scheduler_type, job_type to give the information for front end judging if there is the ‘Rescale‘ subpage
- URL: /jobs/overview
- Response body schema:
- The corresponding internal change about java class
The urn description will be updated when developing.
The Web UI and REST interfaces
- The all pages are the sketch draft, the final style must follows Flink UI style as the standard.
- The urn that is related to the introduced or changed schemas will be updated when developing.
- The design of the rescale history UI will follow the style of the checkpoints-related pages.
- But the design of the rescale history REST API will not follow fully the style of the checkpoints-related interfaces.
- The main difference is that the current section provides a clearer and more explicit breakdown of the REST interfaces.
- Compared to the solution outlined in the "Rejected Alternatives" section, the current design has the following pros and cons:
- During implementation, the number of XXXHandler classes will increase nearly linearly with the number of pages.
- However, the responsibilities of each interface are clearly defined and straightforward.
Rescale Overview
Introduce the rescales overview REST API
- URL: "/jobs/:jobid/rescales/overview"
- METHOD: GET
- Parameter: N.A
- The response schema:
Rescale Overview UI
The goal of the page is to have the rescale overview aligned with the checkpoint overview at UI side.
When the Rescales subpage is accessed, it defaults to displaying information from the Overview section.
If rescale events corresponding to latest completed, latest ignored, or latest failed exist, the interface will automatically request the details API /jobs/:jobid/rescales/details/:rescaleuuid and display the detailed information.
When displaying string values of the UUID type, for ease of presentation and layout convenience, we can show only the first eight characters instead of the complete string. This is similar to the abbreviated display of Git commit IDs.
- When displaying vertex name in vertices table, for ease of presentation and layout convenience, we can show only the name at most 32 chars instead of the complete name. In my limited read, there may be cases where task names are relatively long in sql jobs.
In the front-end implementation, tooltip explanations need to be added to the header fields of the table below. A prompt message will pop up when the mouse hovers over a corresponding header field, and the tooltip will be dismissed when the mouse moves away.
- The header attributes of Rescale information:
- Rescale UUID: The unique ID in Rescale consists of 32 hexadecimal characters
- Attempt ID: The number ID of Rescale attempts that occurred under the same resource requirements
- Requirements ID: The unique ID of resource requirements consists of 32 hexadecimal characters
- Trigger Cause: The reason that triggers the target Rescale
- Terminal State: The end state of the target Rescale
- Terminated Reason: The reason for the completion or termination of the target Rescale
- Start Time: The start time of the target Rescale.
- Duration: Duration from the start of the rescale to its completion or until now
- End Time: The end time of the target Rescale.
- The header attributes of Vertices
- ID: The unique ID of target JobVertex consists of 32 hexadecimal characters
- Name: The short name of target vertex
- Slot Sharing Group ID:The unique ID of the slot sharing group consists of 32 hexadecimal characters
- Previous Parallelism: The previous parallelism of target vertex before the current rescale
- Acquired Parallelism: The acquired parallelism of target vertex after the current rescale
- Sufficient Parallelism: The minimal parallelism of target vertex to run
- Desired Parallelism: The desired parallelism of the target vertex.
- The header attributes of Slots
- Slot Sharing Group ID: The ID of the slot sharing group to which the slot belongs consists of 32 hexadecimal characters
- Slot Sharing Group Name:The name of the slot sharing group to which the slot belongs
- Previous Slot: The previous number of slots before the rescale
- Acquired Slot: The acquired number of slots after the rescale
- Desired Slot: The desired number of slots of the rescale
- Sufficient Slot: The minimal number of slots to deploy tasks in the rescale
- Required Profile: The required resource profile of the slot sharing group in the rescale
- Acquired Profile: The acquired resource profile of the slot sharing group in the rescale
- The header attributes of Scheduler State History
- State: The scheduler state name
- Enter Time: The time to enter the state
- Leave Time: The time to leave the state
- Duration: The duration time from enter time to leave time of the state
- Exception: The exception information about current rescale during the state
- The header attributes of Rescale information:
Rescale History
Introduce the rescales history REST API
- URL: "/jobs/:jobid/rescales/history"
- METHOD: GET
- Parameter: N.A
- The response schema:
Rescale History UI
When accessing the History subpage, the interface will call the API /jobs/:jobid/rescales/history and display a summary of historical rescale events.
When displaying string values of the UUID type, for ease of presentation and layout convenience, we can show only the first eight characters instead of the complete string. This is similar to the abbreviated display of Git commit IDs.
In the front-end implementation, tooltip explanations need to be added to the header fields of the table below. A prompt message will pop up when the mouse hovers over a corresponding header field, and the tooltip will be dismissed when the mouse moves away.
- The header attributes of Rescale information:
- Rescale UUID: The unique ID in Rescale consists of 32 hexadecimal characters
- Attempt ID: The number ID of Rescale attempts that occurred under the same resource requirements
- Requirements ID: The unique ID of resource requirements consists of 32 hexadecimal characters
- Trigger Cause: The reason that triggers the target Rescale
- Terminal State: The end state of the target Rescale
- Terminated Reason: The reason for the completion or termination of the target Rescale
- Start Time: The start time of the target Rescale.
- Duration: Duration from the start of the rescale to its completion or until now
- End Time: The end time of the target Rescale
- The header attributes of Rescale information:
Introduce the rescale details REST API
- URL: "/jobs/:jobid/rescales/details/:rescaleuuid"
- METHOD: GET
- The response schema:
Rescale Details UI
When a user clicks on a specific Rescale to view its details, the interface will call the corresponding API /jobs/:jobid/rescales/details/:rescaleuuid and display the details of the selected rescale event.
When displaying string values of the UUID type, for ease of presentation and layout convenience, we can show only the first eight characters instead of the complete string. This is similar to the abbreviated display of Git commit IDs.
- When displaying vertex name in vertices table, for ease of presentation and layout convenience, we can show only the name at most 32 chars instead of the complete name. In my limited read, there may be cases where task names are relatively long in sql jobs.
In the front-end implementation, tooltip explanations need to be added to the header fields of the table below. A prompt message will pop up when the mouse hovers over a corresponding header field, and the tooltip will be dismissed when the mouse moves away.
- The header attributes of Rescale information:
- Rescale UUID: The unique ID in Rescale consists of 32 hexadecimal characters
- Attempt ID: The number ID of Rescale attempts that occurred under the same resource requirements
- Requirements ID: The unique ID of resource requirements consists of 32 hexadecimal characters
- Trigger Cause: The reason that triggers the target Rescale
- Terminal State: The end state of the target Rescale
- Terminated Reason: The reason for the completion or termination of the target Rescale
- Start Time: The start time of the target Rescale.
- Duration: Duration from the start of the rescale to its completion or until now
- End Time: The end time of the target Rescale.
- The header attributes of Vertices
- ID: The unique ID of target JobVertex consists of 32 hexadecimal characters
- Name: The short name of target vertex
- Slot Sharing Group ID:The unique ID of the slot sharing group consists of 32 hexadecimal characters
- Previous Parallelism: The previous parallelism of target vertex before the current rescale
- Acquired Parallelism: The acquired parallelism of target vertex after the current rescale
- Sufficient Parallelism: The minimal parallelism of target vertex to run
- Desired Parallelism: The desired parallelism of the target vertex.
- The header attributes of Slots
- Slot Sharing Group ID: The ID of the slot sharing group to which the slot belongs consists of 32 hexadecimal characters
- Slot Sharing Group Name:The name of the slot sharing group to which the slot belongs
- Previous Slot: The previous number of slots before the rescale
- Acquired Slot: The acquired number of slots after the rescale
- Desired Slot: The desired number of slots of the rescale
- Sufficient Slot: The minimal number of slots to deploy tasks in the rescale
- Required Profile: The required resource profile of the slot sharing group in the rescale
- Acquired Profile: The acquired resource profile of the slot sharing group in the rescale
- The header attributes of Scheduler State History
- State: The scheduler state name
- Enter Time: The time to enter the state
- Leave Time: The time to leave the state
- Duration: The duration time from enter time to leave time of the state
- Exception: The exception information about current rescale during the state
- The header attributes of Rescale information:
Rescale Summary
Introduce the rescales summary REST API
- URL: "/jobs/:jobid/rescales/summary"
- METHOD: GET
- Parameter: N.A
- The response schema:
Rescale Summary UI
When accessing the summary subpage, the interface /jobs/:jobid/rescales/summary will be called, and the corresponding statistics list will be displayed.
When the user clicks the Rescale Duration Percentile dropdown button, the page will display additional statistical information
Rescale configuration
Rescale configuration REST API
Introduce the adaptive scheduler config related REST API
- URL: "/jobs/:jobid/rescales/config"
- METHOD: GET
- Parameter: N.A
- The response body schema:
Rescale configuration UI
When accessing the configuration subpage, the interface /jobs/:jobid/rescales/config will be called, and the corresponding configuration information will be displayed.
Compatibility, Deprecation, and Migration Plan
This is a new feature, so there is no need to consider previous developments.
Test Plan
The REST endpoints part:
Regarding this part, we plan to test the REST endpoints through the RestHandler framework, similar to the workflow implemented in classes like org.apache.flink.runtime.rest.handler.job.checkpoints.AbstractCheckpointStatsHandlerTest.
The UI part:
The UI will be tested visually through manual testing.
Rejected Alternatives
The following original Rescale Overview,Rescale History, Rescale Summary parts will share a single REST interface '/jobs/:jobid/rescales' to fetch data.
The goal of the design about the REST point is to have the rescale overview aligned with the checkpoint overview at the REST interfaces side.
The candidate solution for this section is beneficial for reducing the number of handlers during implementation.
However, the drawback is that using only a single REST API interface to fulfill these responsibilities would make the interface’s role bloated and less clear.
Rescale Overview
Introduce the rescales REST API
URL: "/jobs/:jobid/rescales"METHOD: GETParameter: N.AThe response schema:
Rescale Overview UI
The page will have the rescale overview aligned with the checkpoint overview as mentioned in the main design.
The page will only use the sub-response result shown in the schema 'summary' & 'latest' parts.
Rescale History
Rescale History UI
Rescale History UI
The design details is same as mentioned in the main design part.
The page will only use the sub-response result shown in the schema 'history' part.
Introduce the rescale details REST API
The design details is same as mentioned in the main design part.
Rescale Summary
Rescale Summary UI
When accessing the summary subpage, the interface /jobs/:jobid/rescales will be called, and the corresponding statistics list will be displayed.
The UI design details is same as mentioned in the main design part.






