Status
Current state: accepted
Discussion thread: https://lists.apache.org/thread/qfhcm7h8r5xkv38rtxwkghkrcxg0q7k5
JIRA:
-
FLINK-33297Getting issue details...
STATUS
Released: <Flink Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
The current "flink-conf.yaml" parser in FLINK is not a standard YAML parser, which has some shortcomings. Firstly, it does not support nested structure configuration items and only supports key-value pairs, resulting in poor readability. Secondly, if the value is a collection type, such as a List or Map, users are required to write the value in a FLINK-specific pattern, which is inconvenient to use. Additionally, the parser of FLINK has some differences in syntax compared to the standard YAML parser, such as the syntax for parsing comments and null values. These inconsistencies can cause confusion for users, as seen in FLINK-15358 and FLINK-32740.
By supporting standard YAML, these issues can be resolved, and users can create a Flink configuration file using third-party tools and leverage some advanced YAML features. Therefore, this FLIP aims to introduce a standard YAML parser for parsing the FLINK configuration file.
Public Interfaces
Introduce a "config.yaml" configuration file which differs from the old configuration file named "flink-conf.yaml" as the next generation of Flink configuration files. When "config.yaml" exists in the Flink conf directory, Flink will use the standard YAML parser to parse it as the Flink configuration.
Modify the Flink packaging process so that when generating flink-dist, the "flink-conf.yaml" file is no longer generated in the conf directory. Instead, a "config.yaml" file that conforms to the standard YAML syntax will be generated.
For compatibility reasons, in Flink 1.x, if the old configuration file "flink-conf.yaml" exists in the Flink conf directory, Flink will ignore the new configuration file and use the old parser to parse "flink-conf.yaml" as the Flink configuration. In Flink 2.x, Flink will no longer support parsing the old configuration file flink-conf.yaml.
- To reduce the migration barrier for users and improve compatibility, we plan to extend the functionality of the standard YAML parser. The extended functionality will allow the standard YAML parser to parse List or Map type configOptions according to the FLINK-specific pattern. In Flink 2.x, Flink will no longer support parsing the FLINK-specific List or Map pattern.
Compatibility, Deprecation, and Migration Plan
Compatibility:
There is no compatibility issue here because we use different configuration file names to ensure compatibility. The default configuration file will be changed to "config.yaml" and parsed by the standard YAML parser. If users prefer to use the old parser, they can create a "flink-conf.yaml" file in the conf directory.
Note that there are some behavior changes when using the standard YAML parser compared to using current Flink parser:
Standard YAML parser
FLINK parser
ConfigOption key
The key of a ConfigOption cannot be a prefix of another option's key.
No requirements.
Comment
Comments must be separated from other tokens by white space characters.
Anything after the '#' symbol is considered a comment.
Null value
Parses null or blank values as null value.
Anything after the first ':' symbol is treated as a string type value, except for blank values.
Special characters
The indicator characters in standard YAML are as follows, and more details can be referred to at: https://yaml.org/spec/1.2.2/.
If you want to use the following special characters as a part of a string value, you need to escape them by using quotation marks.
- -
- ?
- :
- ,
- [
- ]
- {
- }
- #
- &
- *
- !
- |
- >
- '
- "
- %
- @
- `
Currently, there are some special characters sequences in the FLINK parser, as follows:
- ": " (a colon and a whitespace)
- In the FLINK parser, the first ": " in a key-value pair is considered as the delimiter between the key and value, while the remaining ": " will be treated as a part of the value.
- '#'
- In the FLINK parser, anything after the first '#' is considered as a comment. So the hash tag (#) cannot be included as a part of the key-value pair.
- ';'
- When using the List type config option, the semicolon (;) is used as the delimiter between List elements. If you want to include ';' as a part of an element value, you need to escape it using quotation marks.
- ','
- When using the Map type config option, the comma (,) is used as the delimiter between map elements, If you want to include ',' as a part of the value, you need to escape it using quotation marks.
- ':'
- When using the Map type config option, the colon (:) is used as the delimiter between key and value in the map, If you want to include ':' as a part of the value, you need to escape it using quotation marks.
Duplicate key
Standard YAML does not allow duplicate keys in a map. It states that "The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique." For more details, please refer to: https://yaml.org/spec/1.2.2/#nodes.
FLINK parser allows users to configure duplicate keys, where the key-value pair that appears later in the file will override the earlier key-value pair.
Sequences style
Standard YAML has two styles for sequences, as follows:
- Flow Style:
- Flow sequence content is denoted by surrounding “
[
” and “]
” characters. Sequence entries are separated by a “,” character. More details can refer to: https://yaml.org/spec/1.2.2/#741-flow-sequences
- Flow sequence content is denoted by surrounding “
For example:
Flow style example[A, B, C]
- Blocking Style:
A block sequence is simply a series of nodes, each denoted by a leading “
-
” indicator. The “-
” indicator must be separated from the node by white space. More details can refer to: https://yaml.org/spec/1.2.2/#821-block-sequences.
For example:
Blocking style example- A - B - C
The sequence style in FLINK parser is separated by ";" (semicolon).
For example: A;B;C
Mapping style
Standard YAML has two styles for Mapping, as follows:
- Flow Style:
Flow mappings are denoted by surrounding “
{
” and “}
” characters. Mapping entries are separated by a “,” character. More details can refer to: https://yaml.org/spec/1.2.2/#742-flow-mappings
For example:
Flow style example{k1: v1, k2: v2, k3: v3}
- Blocking Style:
A Block mapping is a series of entries, each presenting a key/value pair and more details can refer to: https://yaml.org/spec/1.2.2/#822-block-mappings.
For example:
Blocking style examplek1: v1 k2: v2 k3: v3
The Mapping style in FLINK parser separates key-value pairs using ":" (colon), and different key-value pairs are separated by "," (comma).
For example: k1:v1, k2:v2, k3:v3
Deprecation:
The old configuration file "flink-conf.yaml" will be deprecated by announcing this change in the release notes and user documentation.
Migration Plan:
In FLINK-2.0, the old parser will no longer be supported, nor will the "flink-conf.yaml" file be used as a configuration file. And the standard YAML parser will not support parse the FLINK-specified List and Map pattern.
Test Plan
The change will be tested via UT cases and e2e tests.
Rejected Alternatives
Reject using a standard YAML parser to parse “flink-conf.yaml”
To avoid unexpected behavior changes for FLINK users, it is not reliable to use a standard YAML parser to parse the existing "flink-conf.yaml" file. This is because using different parsers to parse the same configuration file can result in potential changes in the parsing results, which can be difficult for users to notice. The following are examples that would cause a breaking change:
Comment
The FLINK parser considers anything after the '#' symbol as a comment, while the standard YAML parser required "Comments must be separated from other tokens by white space characters". If the user configures "key1: value1#comment1" in "flink-conf.yaml", the old parser will regard "value1" as the value and "#comment1" as a comment. However, the standard YAML parser will regard "value1#comment1" as the value.
2. Null value
The FLINK parser considers anything after the first ':' symbol as string type value. However, the standard YAML parser parses null or blank as null, unless it is enclosed in double quotes, such as "null".