Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

[This FLIP proposal is a joint work of Yuxin Tan  and Xuannan Su ]


Motivation

As Flink moves toward 2.0, we have revisited all runtime configurations and identified several improvements to enhance user-friendliness and maintainability. In this FLIP, we aim to refine the runtime configuration.

Public Interfaces

  • We propose to deprecate the hash-based blocking shuffle in 1.20 and remove the hash-based blocking shuffle in 2.0. The following configuration will be deprecated and removed accordingly. 

    • taskmanager.network.sort-shuffle.min-parallelism: The default value of this option is 1, which means that the sort-shuffle is used by default. 

    • taskmanager.network.blocking-shuffle.type: This option is for hash-based blocking shuffle only.

  • We propose to deprecate and remove the legacy hybrid shuffle mode, and the following configuration will be deprecated and removed accordingly:

    • taskmanager.network.hybrid-shuffle.enable-new-mode: This is currently true by default. It should be removed along with the legacy hybrid shuffle mode.

    • taskmanager.network.hybrid-shuffle.spill-index-region-group-size: This option is for legacy mode only.

    • taskmanager.network.hybrid-shuffle.num-retained-in-memory-regions-max: This option is for legacy mode only.

  • Shuffle Memory-Related Options

    • The following options are deprecated already, these will be removed in 2.0

      • taskmanager.network.numberOfBuffers

      • taskmanager.network.memory.fraction

      • taskmanager.network.memory.min

      • taskmanager.network.memory.max

    • For the following option, they will be deprecated in 1.20 and removed in 2.0. Based on our experience, streaming jobs seldom configure these options. Furthermore, FLIP-266 simplifies the configuration, such that users no longer need to concern themselves with the floating buffer and exclusive buffer settings.

      • taskmanager.network.memory.buffers-per-channel

      • taskmanager.network.memory.floating-buffers-per-gate

      • taskmanager.network.memory.max-buffers-per-channel

    • taskmanager.network.memory.max-overdraft-buffers-per-gate will be deprecated in 1.20 and removed in 2.0: After consulting with a developer familiar with this option, we have reached a consensus that the option is too complex for users to understand and impractical for them to configure correctly. As a result, it will be removed and hard-coded to 20 for the following reasons:

      • Removing this option means users cannot change it, it might be better to turn it up.

      • Most tasks don't use the overdraft buffer, so increasing it doesn't introduce more risk.

    • taskmanager.network.memory.exclusive-buffers-request-timeout-ms will be renamed to taskmanager.network.memory.buffers-request-timeout-ms: We don't want to expose the concept between exclusive and floating buffers.

  • Shuffle Compression Related Options

    • Introduce none option to taskmanager.network.compression.codec and remove @Experimental annotation.

    • Deprecate and remove option taskmanager.network.batch-shuffle.compression.enabled, which will be replaced by setting none to taskmanager.network.compression.codec 

  • The following Netty-related options are too complex for end users to utilize effectively. Therefore, we propose to deprecate and remove these options unless concerns are raised:

    • taskmanager.network.netty.num-arenas

    • taskmanager.network.netty.server.numThreads

    • taskmanager.network.netty.client.numThreads

    • taskmanager.network.netty.server.backlog

    • taskmanager.network.netty.sendReceiveBufferSize

    • taskmanager.network.netty.transport

  • Deprecate and remove fine-grained.shuffle-mode.all-blocking: The option is only used when fine-grained resource management is applied in batch jobs. Currently, the default value of the configuration is false and the user has to explicitly set it to true or the job will fail with exception. Therefore, we will remove it in Flink 2.0, and it will always be true for batch jobs.

  • Misc

    • Mark StreamPipelineOptions class as deprecated in 1.20 and remove it in 2.0: It only contains deprecated configuration.

    • Deprecate and remove taskmanager.network.max-num-tcp-connections: After checking with the developer who is familiar with the module, the option can be removed and always use the default value 1.

Compatibility, Deprecation, and Migration Plan

For options to be renamed:

  • In Flink 1.20, the new ConfigOptions will be introduced with deprecated keys set to the old key and the old ConfigOptions will be marked as deprecated.

  • In Flink 2.0, the deprecated ConfigOptions will be removed.


For options to be removed:

  • In Flink 1.20, we mark the option as deprecated.

  • In Flink 2.0, the deprecated ConfigOptions will be removed.


Some of the configuration changes involve changing or removing some feature or behavior. The feature or behavior changes may be detailed in other FLIP or Jira tickets. Here we just listed all the related changes:

  • Deprecate hash-blocking shuffle in 1.19 and remove it shuffle in 2.0
  • Deprecate legacy hybrid shuffle mode in 1.19 and remove it in 2.0
  • Hardcode max overdraft buffer to 20 in 2.0

Test Plan

Existing UT/IT can ensure compatibility with old options. New tests will cover the new options.