Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".

Discussion thread

https://lists.apache.org/thread/xxx                                                               

Vote thread
JIRA


Release1.20

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Table of Contents

Motivation

...

  1. The default inference mode will change to dynamic inference.
  2. Upon enabling dynamic inference mode, the logic for applying the upper bound of parallelism inference will change. In static inference mode, hive source takes the value of `table.exec.hive.infer-source-parallelism.max` as the upper bound for inference, if it is not set, the config option's default value 1024 will be used. However, in dynamic inference mode, if the user explicitly configures `table.exec.hive.infer-source-parallelism.max`, we will still use the user-configured value as the upper bound for parallelism inference. Otherwise, we will use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as the upper bound for parallelism inference. If neither of these configuration items is set, the upper bound for parallelism inference will follow the existing adaptive batch scheduler logic, falling back to `execution.batch.adaptive.auto-parallelism.max-parallelism` or `parallelism.default`. This could result in different hive source parallelism settings within user jobs when running under dynamic inference mode. 

Proposed Changes

General Idea

...