Current state: Discarded
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Today's connect predicates enables checks on the record metadata. However, this can be limiting considering many inbuilt and custom transformations that we have are more key/value-centric.
Some use-cases this can solve:
- Data type conversions of certain pre-identified fields for records coming across datasets only if those fields exist. [Ex:
TimestampConvertercan be run only if the specified date field exists irrespective of the record metadata]
- Skip running certain transform if a given field does/does not exist. A lot of inbuilt transforms raise exceptions (Ex:
InsertFieldtransform if the field already exists) thereby breaking the task. Giving this control enable users to consciously configure for such cases.
- Even though some inbuilt transforms explicitly handle these cases, it would still be an unnecessary pass-through iteration.
- Considering each connector usually deals with multiple datasets (Even 100s for a database CDC connector), metadata-centric predicate checking will be somewhat limiting when we talk about such pre-identified common fields in the records from across the datasets.
This KIP proposes to have a new HasField predicate class that works for records with or without schema.
- The predicate takes a field path configuration option string (Ex:
abc.xyz). Hence supports nested field checks.
- Return true/false based on whether the path exists in the record.
Predicate Settings in Kafka Connect Config:
- Converting a timestamp only if
- Inserting only if the field
Compatibility, Deprecation, and Migration Plan
Fully respects the current predicate interfaces and there is no need for deprecation/migration.
Test Plan / Cases
field.pathsetting is mandatory when the predicate is enabled
- The predicate will return false for records with null keys/values.
- The predicate returns true if the field exists in the record, even if the field is set to null.
- Creating multiple custom transforms to handle the field checks.