Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Change management script to have subcommands, clarify details and deprecation strategy.

...

This includes plugins built and published by the Kafka project itself, which will be migrated at have ServiceLoader manifest files added as part of implementing this KIP.

...

The default value for this configuration when used in the EmbeddedConnectCluster  test utility will be HYBRID_FAIL .

...

Plugin Path Management Script

In addition, a new script bin/connect-scan-plugin-path.sh  will be developed to execute the manage the worker plugin path. For the purposes of this migration, this script will execute plugin path scanning and generate shim JARs which include ServiceLoader  manifests. This can be run ahead-of-time during CI, and will allow a connect instance to use non-updated plugins with SERVICE_LOAD .

The script would take the following arguments, with the following meanings:

  • A positional argument sub-command  which takes exactly one of the following values:
    • add-
    plugin-location <single-plugin-jar-zip-dir>
    • manifests 
      • For each concrete plugin implementation which is missing a ServiceLoader manifest, add a manifest file and/or entry.
        • For
     
        • a single jar  or zip  file, for which a new resource file may be added to the existing archive.
        • For a directory with an arbitrary hierarchy of jar  or zip  files, for which additional directories and/or files may be added to the existing directory.
        • For a directory with an arbitrary hierarchy of class  files, for which additional directories and/or files may be added to the existing directory.
      • The path and all subdirectories and files specified in other options must be writable.
    • remove-manifests 
      • For each plugin ServiceLoader manifest, if the referenced class cannot be found within the plugin, remove the manifest entry and/or file.
      • The path and all subdirectories and files specified in other options must be writable.
    • list
      • Print a human-readable summary of a plugin path and the plugins contained within.
      • For each plugin, include:
        • The fully qualified class name
        • Plugin aliases (if available)
        • The version (if available)
        • Whether the class is discoverable via scanning
        • Whether the class is discoverable via ServiceLoader
      • The path and all subdirectories and files specified in other options must be readable.
  • --plugin-location <single-plugin-jar-zip-dir> 
    • The value of this argument will be a single plugin, which can be any of the following:
      • a single jar  or zip  file
      • a directory with an arbitrary hierarchy of jar  or zip  files
      • a directory with an arbitrary hierarchy of class  files
    • This can be specified zero or more times.
  • The path and all subdirectories and files specified must be writable.
  • --plugin-path <list-of-paths> 
    • The value of this argument will follow the same semantics of the worker properties plugin.path  configuration.
    • This will be equivalent to specifying multiple --plugin-location  arguments, one for each top-level archive, and for each immediate sub-folder of each top-level directory.
    • This can be specified zero or more times
  • .The paths and all subdirectories and files specified must be writable
    • .
  • --worker-config <worker-properties-file> 
    • From this worker properties file, the plugin.path contents on-disk will be mutated.
    • This will be equivalent to extracting the plugin.path  configuration from the worker properties and specifying --plugin-path .
    • This can be specified zero or more times.
  • The paths and all subdirectories and files specified in plugin.path configuration must be writable
  • --dry-run 
    • Can only be specified in combination with add-manifests  or remove-manifests.
    • If specified, execute all of the steps needed for the normal script execution except the final writing the changes to disk.
    • If not specified, the prescribed changes to the plugin path are written to disk.
    • If a condition which would prevent the script from completing normally is detected, the exit code of the script will be non-zero, and details will be printed via stderr in a human-readable format.

The list  command is meant for inspecting the plugin.path before and after a migration takes place, and should expose the information that the add-manifests  and remove-manifests  commands are using to perform the migration.

This script would migrate the specified paths in-place, and require the input files to be writable. The arguments which do not require a worker config are intended to provide smaller subunits of the migration for callers which want to divide the migration for error handling, modular CI builds, or reusable tooling.

If the script fails at any point with add-manifests  or remove-manifests but without --dry-run  specified, the plugin path may be left in an indeterminate state and should not be relied upon for correctness. After a script failure, it is recommended to clear the disk contents and restore it to a known-good state. If used in CI, a script failure can be made to cause the build to fail and be retried from the beginning.

If the script succeeds, subsequent runs on the same script directory should be idempotent. Subsequent runs on a partially changed directory should be idempotent for the unchanged parts, and should re-migrate the changed parts. If a plugin class is removed from the path, the corresponding shim manifest should also be removed.

Compatibility, Deprecation, and Migration Plan

...

Once a connect operator updates their environment to a version with this feature, they will receive log warnings. If they notice these warnings, they will be able to upgrade plugins to versions which alleviate the warning, or contact their vendors/plugin developers to encourage them to update their plugins.  They will be able to see the progress of this update in the startup logs, and via bin/connect-plugin-path.sh list.

While waiting for plugins to update, they can use the bin/connect-scan-plugin-path.sh   script add-manifests to migrate plugins at the point of use in their CI, and use SERVICE_LOAD  mode in their environment configuration.

After a connect operator has updated all of the plugins, they can remove bin/connect-scan-plugin-path.sh add-manifests  from their CI, and change the CI test configuration to HYBRID_FAIL  to catch any regressions.

...

The new interfaces will ideally be released in a 3.x version of Kafka. All modes other than SERVICE_LOAD  should be marked deprecated. The shim script should be marked deprecated. The configuration default could be changed

In a follow-up KIP as early as 4.0, we should propose changing the configuration default to SERVICE_LOAD  as early as 4.0, given the ease of applying the workarounds.

In the futurea second follow-up KIP as early as 5.0, we should schedule the removal of the scanning behavior as early as Kafka 5. 0. This would mean that connector plugins built for Kafka <3.x will not work for Kafka 5.0, or whatever version the removal takes place. Plugins with manifests will work for versions of Kafka <4.0 without issue. The bin/connect-scan-plugin-path.sh script may be kept for longer than the runtime scanning behavior to aid migration, without imposing a significant burden on future Connect feature development.In this same KIP, we may choose to deprecate the add-manifests  and remove-manifests  script sub-commands.

Test Plan

Existing system tests will be configured to start workers with SERVICE_LOAD immediately for performance reasons, and as this is now the recommended running mode.

...

New non-migrated, systems-test-only plugins will be added to the system test build to verify that a non-migrated plugin will have the intended effect in each mode. These plugins will be used to test the bin/connect-scan-plugin-path.sh  migration script script. As part of this, existing system-test-only plugins will be refactored out of the publicly distributed build, and special cases for them removed from production code-paths.

...

  • Using OSGi. In addition to the reasons noted in KIP-146, OSGi represents a much more invasive change to the Connect framework than this KIP is targeting, and with much less clear benefit. There are also existing plugins using the ServiceLoader paradigm which would require extra migrations.
  • Have the migration script copy-on-write and not mutate the on-disk worker config or plugin.path. This adds a lot of complexity to the script, and complexity in specifying the output locations. This is easily avoided by the user copying their plugins to a writable scratch space before running the migration script.
  • Discarding ClassLoaders to enable garbage collection of scanned classes after scanning is complete. This solves the ongoing memory overhead of the scanned classes, but does not remove the initial CPU overhead of the scanning operation itself.
  • Adding module-info.java files for Apache plugins, in addition to, or in lieu of, ServiceLoader manifests. Because Kafka supports Java 8, we cannot rely on Java 9+ features. And adding modules for Kafka is outside the scope of this improvement, and deserves attention in a separate KIP.