Status

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Aligning the overview page of JobManager with the Flink's new memory model (FLIP-116). Additionally, we want to align the JobManager's details page containing memory-related information with the solution proposed in FLIP-102.

Proposed Changes

This includes making JobManager's memory-related metrics available in the UI. Additionally, the effective configuration parameters should be exposed similar to the TaskManager's overview (see FLIP-102: Add More Metrics to TaskManager).

JVM Metrics

These JVM metrics are exposed and can be used through the JobManager's metrics REST API.

JVMMetricUsed keyTotal key
HeapStatus.JVM.Memory.HeapUsedMax
DirectStatus.JVM.Memory.DirectUsedMax
MappedStatus.JVM.Memory.MappedMemoryUsedTotalCapacity
NonHeapStatus.JVM.Memory.NonHeapMemoryUsedTotalCapacity
Metaspace

Status.JVM.Memory.Metaspace FLINK-19617 - Getting issue details... STATUS

UsedMax

Memory Configuration

Flink's memory model (as described in org.apache.flink.runtime.jobmanager.JobManagerProcessSpec) can be mapped to the following Flink configuration parameters. There are a few that have a correlating Flink metric.

Flink Memory ModelFlink configuration1Effective Configuration REST API2Metric3Used keyTotal key
Heapjobmanager.memory.heap.sizejobmanager.memory.heap.sizeStatus.JVM.Memory.HeapUsedMax
Off-Heapjobmanager.memory.off-heap.sizejobmanager.memory.off-heap.size---
JVM Metaspacejobmanager.memory.jvm-metaspace.sizejobmanager.memory.jvm-metaspace.size

Status.JVM.Memory.Metaspace  FLINK-19617 - Getting issue details... STATUS

UsedMax
JVM Overheadjobmanager.memory.jvm-overhead.minjobmanager.memory.jvm-overhead.min/ jobmanager.memory.jvm-overhead.max4---
jobmanager.memory.jvm-overhead.max

1 These are the configuration parameters used in the Flink configuration.
2 These are the config parameters exposed through the cluster config REST API. Their names matching the actual Flink config as the effective configuration is generated out of the passed configuration.  FLINK-19662 - Getting issue details... STATUS
3 The metrics which are exposed through the JobManager's metrics REST API.
4 min and max are having the same value.

Frontend Design

Add new metrics page

REST API proposal:

Metrics

The API can be used to retrieve the metrics for the JobManager: http://localhost:8081/jobmanager/metrics?get=Status.JVM.Memory.Heap.Max,Status.JVM.Memory.Heap.Used,Status.JVM.Memory.NonHeap.Max,Status.JVM.Memory.NonHeap.Used,Status.JVM.Memory.Metaspace.Max,Status.JVM.Memory.Metaspace.Used

The Metaspace metrics need to be implemented. This is going to be handled by  FLINK-19617 - Getting issue details... STATUS .

Memory Configuration

FLINK-19662 - Getting issue details... STATUS

We want to expose the effective configuration through a new REST endpoint. We have to consider that the memory configuration depends on the type of cluster (legacy standalone vs containerized memory configuration).

Test Plan

Covered by unit tests.