Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).


Motivation

As Flink moves toward version 2.0, we want to provide users with a better experience with the existing configuration. In this FLIP, we outline several general improvements to the current configuration.

Public Interfaces

We listed the general improvements we want to make in this FLIP as the following. Details of each change can be found in the "Proposed Changes" section:

  • Ensure all the ConfigOptions are properly annotated

  • Ensure all user-facing configurations are included in the documentation generation process

  • Make the existing ConfigOptions use the proper type

  • Mark all internally used ConfigOptions with the @Internal annotation

Proposed Changes

In this section, we describe in detail all the configurations that need updating.

Ensure all the ConfigOptions are properly annotated

Many user-facing ConfigOptions are currently not annotated at all. We will make sure that they are properly annotated. 


Mark the following class as PublicEvolving:

  • CEPCacheOptions

  • AlgorithmOptions

  • HighAvailabilityOptions

  • RestOptions

  • SecurityOptions

  • SlowTaskDetectorOptions

  • InfluxdbReporterOptions

  • PrometheusPushGatewayReporterOptions

  • JobResultStoreOptions

  • ShuffleServiceOptions

  • SqlClientOptions

  • YarnConfigOptions


We will also update the ConfigOptionsDocGenerator to verify that the ConfigOptions are properly annotated.

Ensure all user-facing configurations are included in the documentation generation process

The following ConfigOptions are defined in classes that are not included in the documentation generation process. We will relocate these ConfigOptions to a class that is included in the documentation generation.


  • GPUDriver#DISCOVERY_SCRIPT_PATH

  • GPUDriver#DISCOVERY_SCRIPT_ARG


The following ConfigOptions will be moved to a new class GPUDriverOptions at package org.apache.flink.externalresource.gpu.GPUDriver. The docs of the ConfigOptions will be generated as the following with dynamic prefix, similar to MetricOptions:

Key

Default

Type

Description

external-resource.<resource_name>.param.discovery-script.path

(none)

String

The path of the discovery script. It can either be an absolute path, or a relative path to FLINK_HOME when defined or the current directory otherwise. If not explicitly configured, the default script will be used.

external-resource.<resource_name>.param.discovery-script.args

(none)

String

The arguments passed to the discovery script. For the default discovery script, see Default Script for the available parameters.


Note that the above ConfigOptions are invisible to the users currently, so we can directly introduce a PublicEvolving class that contains the ConfigOptions above without a deprecation process.


Make the existing ConfigOptions use the proper type

Some ConfigOptions do not specify the type properly. We will update the type of the ConfigOptions in a backward-compatible way.


The following ConfigOption will be Duration Type:

  • RpcOptions#TCP_TIMEOUT

  • RpcOptions#STARTUP_TIMEOUT

  • ClusterOptions#INITIAL_REGISTRATION_TIMEOUT

  • ClusterOptions#MAX_REGISTRATION_TIMEOUT

  • ClusterOptions#ERROR_REGISTRATION_DELAY

  • ClusterOptions#REFUSED_REGISTRATION_DELAY

  • ClusterOptions#CLUSTER_SERVICES_SHUTDOWN_TIMEOUT

  • HighAvailabilityOptions#ZOOKEEPER_SESSION_TIMEOUT

  • HighAvailabilityOptions#ZOOKEEPER_CONNECTION_TIMEOUT

  • HighAvailabilityOptions#ZOOKEEPER_RETRY_WAIT

  • ResourceManagerOptions#JOB_TIMEOUT

  • ResourceManagerOptions#STANDALONE_CLUSTER_STARTUP_PERIOD_TIME

  • ResourceManagerOptions#TASK_MANAGER_TIMEOUT

  • RestOptions#AWAIT_LEADER_TIMEOUT

  • RestOptions#RETRY_DELAY

  • RestOptions#CONNECTION_TIMEOUT

  • RestOptions#IDLENESS_TIMEOUT

  • InfluxdbReporterOptions#CONNECT_TIMEOUT

  • InfluxdbReporterOptions#WRITE_TIMEOUT

  • YarnConfigOptions#CONTAINER_REQUEST_HEARTBEAT_INTERVAL_MILLISECONDS


Note:

  • When a value is set to Duration type ConfigOption without a time unit, it will be considered as milliseconds. Thus, it is backward compatible if the time unit of the original ConfigOption is millisecond. And all the original ConfigOptiosn above are using millisecond as the time unit.

  • RpcOptions#TCP_TIMEOUT, RpcOptions#STARTUP_TIMEOUT, and ResourceManagerOptions#JOB_TIMEOUT are String typed at the moment, but they are all parsed by method `org.apache.flink.util.TimeUtils#parseDuration`, which is also used to parse the duration typed ConfigOption. Therefore, the changes are backward-compatible.


The following ConfigOption will be Enum type:

  • NettyShuffleEnvironmentOptions#SHUFFLE_COMPRESSION_CODEC

  • OptimizerConfigOptions#TABLE_OPTIMIZER_AGG_PHASE_STRATEGY


Note:

  • The two configurations above will throw an exception if the value is unknown, which is the same behavior if we update the type to enum, so the changes are backward-compatible.


The following ConfigOption will be Int type:

  • YarnConfigOptions#APPLICATION_ATTEMPTS


Mark all internally used ConfigOptions with the @Internal annotation


The ConfigOptions listed below are currently used only internally after checking with the committers familiar with each module; however, they have not yet been marked with the @Internal annotation:


  • PythonDynamicTableOptions

  • Dispatcher#CLIENT_ALIVENESS_CHECK_DURATION

  • ClusterEntrypoint#INTERNAL_CLUSTER_EXECUTION_MODE

  • FileJobGraphRetriever#JOB_GRAPH_FILE_PATH


Compatibility, Deprecation, and Migration Plan

The changes made in this FLIP are backward compatible. No deprecation or migration plan is needed.

Test Plan

Existing UT/IT can ensure compatibility with old options. New tests will cover the new options.