Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposed Changes

Nested notation

Using dots Dotted notation tends to be the most intuitive way to describe the path paths to nested fields in a record structures , and will cover most of the scenarios. e.g. jq already uses it[1], and will cover most of the scenarios.

However, dots are already allowed as part of element names on JSON (i.e. Schemaless) records(field names in JSON could include dots(e.g. {'nested.keyfield': {'value':42}}).

Therefore, the nested notation must support escaping dots that could be valid field names.

Instead of escaping dots with backslashes — which in JSON configurations will lead leads to unfriendly configurations — it's proposed to follow a similar approach as the JSONata[2] to add where backticks are used define field names with dots using backticks, e.g. `nested.key`.valuefield`


[1] https://stedolan.github.io/jq/manual/#Basicfilters

...

  > Field references containing whitespace or reserved tokens can be enclosed in backticks

Rules

  • 1. If field names do not contain dots (.), then only use dots to represent nested fieldsfield paths.
  • 2. If field names contain dots, then:
    • wrap the field name with a backtick pair (`...`) by
      • adding an opening backtick at the beginning of the field name (beginning of path, or after a dot)
      • adding a closing backtick at the end of the field name (end of the path, or before the next dot)
    • if a field is wrapped and doesn't contain dots, is processed the same way: field name within the wrapping backticks is used
  • 3. If field name includes backticks, then:
    • if the backticks are in a wrapping position (opening or closing a field name), then need to be escaped with backslash
      • Backslashes (\) do not need to be escaped. If backslash happen to be part of the field name and before a backtick to be escaped, then add another backslash.
    • else, backticks do not require escape
  • 4. If wrapping backtick pairs are incomplete, Connect configuration must fail fast to avoid getting ambiguous paths deployed.

...

scenarioinputsmtoutput
1. Nested field.


Code Block
languagejs
{
  "k1": 123,
  "parent": {
    "child": {
      "k2": 1556204536000         }
  }
}



Code Block
languagejs
{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "parent.child.k2",
"transforms.smt1.format": "yyyy-MM-dd",
"transforms.smt1.target.type": "string"
}



Code Block
languagejs
{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "2014-04-25"         }
  }
}


2. Nested field, when field names include dots


Code Block
languagejs
{
  "k1": 123,
  "parent.child": {
      "k2": 1556204536000         }
  }
}



Code Block
languagejs

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "`parent.child`.k2",
"transforms.smt1.format": "yyyy-MM-dd",
"transforms.smt1.target.type": "string"
}




Code Block
languagejs
{
  "k1": 123,
  "parent.child": {      "k2": "2014-04-25"   }
}


...

Existing SMT configurations will not be affected by these changes as the default field.style  is plain, which represents the current behavior and users will need to opt-in the the new notation.

Rejected Alternatives

Keep ExtractField as it is and use it multiple times until reaching nested fields

...