Motivation

The current syntax for registering a user-defined function (UDF) that depends on a binary artifact requires the JAR keyword:

CREATE FUNCTION my_func AS 'com.myorg.MyUDF' USING JAR 'hdfs:///path/to/my.jar';


This syntax is limiting for a few key reasons:

  1. Inflexibility for Other Artifact Types: As Flink's ecosystem evolves, particularly with improved Python support (e.g., Python UDFs depending on wheel files or other archives), the JAR  keyword becomes semantically incorrect and restrictive. We need a more generic way to specify a dependency artifact.
  2. Syntactic Inconsistency: To support other artifact types, we would need to introduce new keywords (e.g., USING MODEL, USING ZIP, USING ARCHIVE) or a different syntax entirely, such as a WITH clause. This would lead to an inconsistent DDL experience for users depending on the function's language.
  3. Verbosity: The JAR keyword is often redundant, as the URI itself ('path/to/my.jar') already indicates the artifact type.

This proposal aims to make the USING clause more generic and future-proof by adding ARTIFACT as a keyword in addition to JAR. This provides a cleaner, more consistent syntax for all types of function artifacts while maintaining full backward compatibility.

Table API Alignment and Consistency Motivation

This proposal to add the ARTIFACT keyword for SQL DDL is consistent with the more generic approach already adopted by Flink's programmatic Table API.

The underlying implementation in the Table API is already designed to be artifact-type agnostic, which aligns perfectly with the goal of future-proofing the SQL syntax.

  • Java Table API: The FunctionCatalog.createFunction methods in the Java Table API already use a generic List<ResourceUri> parameter for registering functions, rather than a JAR-specific type. The SQL change effectively extends this generic concept to the DDL layer, making the user experience consistent whether defining functions via SQL or the Java Table API.
  • Python Table API: The current create_java_function method relies on the Java classloader, which would need to be updated in the future.

This alignment ensures that users who work with both the SQL interface and the programmatic Table API will encounter a unified and predictable way of managing function dependencies.

Public Interfaces

The proposed change will affect the SQL DDL syntax for CREATE FUNCTION.

Current Syntax
CREATE [TEMPORARY|TEMPORARY SYSTEM] FUNCTION 
  [IF NOT EXISTS] [catalog_name.][db_name.]function_name 
  AS identifier [LANGUAGE JAVA|SCALA|PYTHON] 
  [USING JAR '<path_to_filename>.jar' [, JAR '<path_to_filename>.jar']* ]
  [WITH (key1=val1, key2=val2, ...)]


Proposed New Syntax:
CREATE [TEMPORARY|TEMPORARY SYSTEM] FUNCTION 
  [IF NOT EXISTS] [catalog_name.][db_name.]function_name 
  AS identifier [LANGUAGE JAVA|SCALA|PYTHON] 
  [USING JAR|ARTIFACT '<path_to_filename>.jar' [, JAR|ARTIFACT '<path_to_filename>.jar']* ]
  [WITH (key1=val1, key2=val2, ...)]


The key change is that the JAR keyword has another option of ARTIFACT. The behavior of the statement will be identical whether the keyword is JAR or ARTIFACT. No other public APIs or interfaces will be changed. It will be possible, in the case of a statement with multiple files, to mix JAR and ARTIFACT.

Proposed Changes

The implementation will focus on modifying the Flink SQL parser.

  1. SQL Parser Adjustment: The grammar for the CREATE FUNCTION statement will be updated to recognize USING ARTIFACT <string_literal> as a valid clause, in addition to the existing USING JAR <string_literal>.
  2. Backend Logic: No significant changes are anticipated in the backend resource management logic. When the parser encounters a USING clause without the JAR keyword, it will process the provided URI and register it as a resource dependency for the function. In the future if different file types require separate handling this can be based on the language type or the identifier.

The change is confined to the SQL parsing layer, treating the URI from USING ARTIFACT '...' identically to the URI from USING JAR '...'.

Compatibility, Deprecation, and Migration Plan

Compatibility

This change is fully backward compatible. All existing SQL statements that use the CREATE FUNCTION ... USING JAR '...' syntax will continue to work without any modification.

Deprecation

There is no plan to deprecate the USING JAR '...' syntax. It can be retained as an optional, explicit clarifier for users who prefer it.

Migration Plan

No migration is necessary. Users can adopt the new, shorter syntax at their convenience.

Test Plan

The change will be validated by extending the existing test suites for Flink SQL DDL.

Positive Test Cases

  • Add a test case to create a Java/Scala function using the new syntax: CREATE FUNCTION ... USING ARTIFACT 'path/to/my.jar';. The test will then execute a query that invokes this function to verify it was registered and loaded correctly.
  • Ensure that creating a function with the old syntax (USING JAR '...') continues to pass all existing tests (regression testing).

Negative Test Cases

  • Add tests for invalid syntax variations (e.g., USING JARS '...', USING ARTIFACTS '...' '...' ) to ensure they are correctly rejected by the parser.

Rejected Alternatives

Using a WITH clause for artifacts

A WITH clause was considered to provide a generic key-value configuration mechanism, which could be useful for other future properties.

CREATE FUNCTION myUDF AS 'com.example.MyUDF'
WITH (
  'artifact.uri' = 'hdfs:///path/to/my.jar'
)

Reason for Rejection: This approach creates a significant syntactic departure from the existing USING JAR clause. It would force users to learn a new syntax and lead to inconsistency, where some functions are defined with USING JAR and others with WITH. The goal is to evolve the current syntax, not introduce a competing one for the same purpose.

Removing the JAR keyword by making it optional

Another alternative was to make the JAR keyword optional.

CREATE FUNCTION myUDF AS 'com.example.MyUDF' USING 'hdfs:///path/to/my.jar';

Reason for Rejection: given there are other resources like connections or models that are registered entities and could use the USING syntax, this would be confusing. Providing an alternative to JAR that is generic solves the same problem, but in a more explicit way.

Follow-up Work

While the SQL parser change is contained and requires no immediate changes to the backend resource management logic or the existing Table APIs, we recognize that enabling full functionality for non-JAR artifacts will require follow-up work on the user-facing APIs.

  • Non-JAR Artifacts: The current Flink runtime primarily uses the Java classloader mechanism for resource loading, which inherently focuses on JAR files.
  • Future Extensions: Subsequent work could focus on extending the Table API and the Flink runtime to specifically handle and utilize different artifact types (e.g., Python wheels, models, archives) based on the function's LANGUAGE or the resource identifier, enabling full end-to-end support for non-JAR dependencies.