Discussion threadhttps://lists.apache.org/thread/5j7dmxybkzp7k695qpb7t4ojp39n48gy
Vote threadTBD
JIRA

FLINK-38958 - Getting issue details... STATUS

Release-

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Flink’s YARN Application Mode recommends using yarn.provided.lib.dirs with a pre-uploaded Flink distribution on HDFS to make job submission lightweight and reuse cached artifacts across applications.

However, in real deployments this often leads to one (or both) of the following issues:

  1. Operational packaging constraints: Some organizations store shared dependencies as compressed archives (e.g., .tgz / .tar.gz) rather than as a directory containing many individual .jar files, to reduce small-file overhead and simplify distribution.

  2. Classpath length / “argument too long” failures: When many jars are added explicitly to the container classpath (especially combined with multiple yarn.provided.lib.dirs entries), the generated launch command / environment can exceed OS limits, causing “argument too long” errors.

We want a small, targeted improvement that:

  • Allows users to point Flink at pre-uploaded, world-readable archive files in HDFS.

  • Ensures jar inclusion into the classpath without enumerating every jar as a separate classpath entry (reducing command length)

  • Preserves the existing behavior for users who do not opt in.

Public Interfaces

New configuration option

Add:

  • yarn.provided.lib.archives (List<String>, semicolon-separated)

    • Each entry is a remote archive path (e.g., HDFS) pointing to a pre-uploaded and world-readable archive.

    • Supported archive types: .tar.gz, .tgz

    • Semantics mirror yarn.provided.lib.dirs (shared, reusable across applications), but input is archive files instead of directories.

Behavioral changes (only when configured)

When yarn.provided.lib.archives is set, Flink will:

  • Register each archive as a YARN LocalResource of type ARCHIVE, which is automatically unarchived by the NodeManager.

  • Add jars from the localized archive to the classpath using a wildcard form (see Proposed Changes) to avoid excessively long classpath strings.

No changes for users who do not use the new option.

Proposed Changes

High-level behavior

  1. Read and validate config

    • Parse yarn.provided.lib.archives similarly to yarn.provided.lib.dirs.

    • Validate each path is remote-accessible from all nodes (same constraint as yarn.provided.lib.dirs).

  2. Register provided archives as shared local resources

    • In the YARN upload/localization step, register each archive as:

      • LocalResourceVisibility.PUBLIC (so it can be reused/cached by NodeManagers across applications),

      • LocalResourceType.ARCHIVE (so it will be unarchived automatically).

  3. Classpath construction using wildcards

    • Instead of enumerating every jar contained in the archive as a separate classpath entry, add a single wildcard entry:

      • <archive-file-name>/*

    • Example (conceptual): if flink-libs.tgz is localized, add flink-libs.tgz/* to the classpath.

    • This design directly targets the “argument too long” failure mode by keeping classpath entries bounded even when archives contain many jars.

  4. Important limitation (initial scope)

    • Only jars in the immediate child directory of the unarchived archive are added via the wildcard.

    • Jars nested deeper (e.g., flink-libs.tgz/foo/bar/baz.jar) are not picked up by flink-libs.tgz/*.

    • Users must package archives accordingly (jars at top-level inside the archive).

Relationship to existing configs

  • yarn.provided.lib.dirs remains the recommended path for users who already publish directories of jars.

  • yarn.ship-archives already exists, but it is for shipping archives per-application submission and is not intended as a “provided shared libs” mechanism that replaces uploading local Flink libs.

Compatibility, Deprecation, and Migration Plan

  • Backward compatible: No behavioral change unless yarn.provided.lib.archives is set.

  • Migration: Users who currently rely on provided libs but package them as archives can:

    • Upload .tgz / .tar.gz to HDFS, ensure world-readable permissions,

    • Set yarn.provided.lib.archives to point to those files,

    • Ensure the archive layout places jars at the archive’s top-level directory.

  • No deprecations in this FLIP.

Test Plan

  • Unit tests

    • Validate parsing and remote-path checks for yarn.provided.lib.archives.

    • Verify archive registration uses LocalResourceType.ARCHIVE.

    • Verify classpath entries include <archiveName>/* rather than enumerated jar paths.

  • Integration / e2e test (YARN)

    • Submit a YARN application-mode job with:

      • yarn.provided.lib.archives pointing to an HDFS archive containing multiple jars at top-level

    • Verify:

      • Job starts successfully,

      • Required classes are loadable from jars in the archive,

      • No “argument too long” error is triggered even with many jars.

Rejected Alternatives

No other ways to accomplish the same.

  • No labels