Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion threadhttps://lists.apache.org/thread/hdrh4nx0zhjzb5q5gkp5w1cqkzb4v6po
Vote threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)
JIRA

FLINK-38353 - Getting issue details... STATUS

Release<Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces added trace/span reporting capability to Flink, that has been used in a couple of places, like reporting checkpointing and recovery processes.

With flat/childless structure of spans it is difficult to accurately report checkpointing or recovery. Single top level span for checkpointing or recovery is currently aggregating some metrics, like maximum and sum of how long did the state download/upload take. However this hides some details, like how long each task and/or subtask was downloading the state.

Proposed Changes

In this FLIP we want to introduce a general mechanism for reporting children spans. Reporting distributed children spans won’t be supported, due to the same reasons as described and discussed in https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces (arguments are copy/pasted to the Rejected Alternatives section in this document at the bottom). It will be possible to report children spans only together with the parent span. The whole tree of spans will have to be reported all at once.

Children spans will be used to report checkpoint statistics from Tasks and SubTasks in a similar way and structure how it is currently displayed in the Flink’s WebUI.

Support for the Children spans will be added to all currently existing TraceReporters (Log4J and OpenTelemetry).

Public Interfaces

Children spans

The following methods will be added:

@Experimental
public interface Span {

    (...)
    
    /** Returns the child spans (= nested). */
    List<Span> getChildren();
}
@Experimental
public class SpanBuilder {
    (...)
    
    /** Adds child spans (= nested). */
    public SpanBuilder addChildren(List<SpanBuilder> children) {
        this.children.addAll(children);
        return this;
    }

    /** Adds child span (= nested). */
    public SpanBuilder addChild(SpanBuilder child) {
        this.children.add(child);
        return this;
    }
}

Checkpoint children spans

Verbosity of checkpoint spans will be configurable and controlled by the following config option:

    /** The detail level for reporting checkpoint spans. */
    public static final ConfigOption<TraceOptions.CheckpointSpanDetailLevel>
            CHECKPOINT_SPAN_DETAIL_LEVEL =
                    key("traces.checkpoint.span-detail-level")
                            .enumType(TraceOptions.CheckpointSpanDetailLevel.class)
                            .defaultValue(CheckpointSpanDetailLevel.SPAN_PER_CHECKPOINT)
                            .withDescription(
                                    "Detail level for reporting checkpoint spans. Possible values:\n"
                                            + "- SPAN_PER_CHECKPOINT (default): Single span per checkpoint. Aggregated sum/max for submetrics from all tasks and subtasks per checkpoint\n"
                                            + "- SPAN_PER_CHECKPOINT_WITH_TASKS: Single span per checkpoint. Same as SPAN_PER_CHECKPOINT, plus arrays of aggregated values per task.\n"
                                            + "- SPANS_PER_TASK: Same as SPAN_PER_CHECKPOINT plus children spans per each task. Each task span with aggregated sum/max submetrics from subtasks.\n"
                                            + "- SPANS_PER_SUBTASK: Same as SPANS_PER_TASK plus children spans per each subtask. Child spans for tasks and grand-child spans for subtasks.");

Compatibility, Deprecation, and Migration Plan

There will be no impact on existing users. The default value of SPAN_PER_CHECKPOINT maintains the pre-existing behaviour. There is no need for any migration.

Test Plan

This feature is already used and tested inside Confluent. Before committing the final accepted version would be tested inside Confluent again.

Rejected Alternatives

It will be only possible to report all hierarchy of spans together at the same time. Support for reporting children spans independently (potentially across different TMs/JMs) will not be implemented as part of this FLIP, as it would require to come up with a more complicated API that would link the related spans, that would be both serialisable and would survive (or be recreated) across Flink restarts. For more information why is this a bit larger effort please check the following thread: 

https://lists.apache.org/thread/cznt6rbncx1ydqcn13m52859qrggq1xg