Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is meant as a template for writing a FLIP. To create a FLIP choose Tools->Copy on this page and modify with your content and replace the heading with the next FLIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state[One of "Under Discussion"]

Discussion thread:  here (<- link to https://mail-archiveslists.apache.org/thread/0xd7mk4lv5xpo8cgdvqpbslxj4lljrc8mod_mbox/flink-dev/)

JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The primary goal of this FLIP is to enhance the usability of PyFlink for the Python developer community. The name is an ode to the Zen of https://peps.python.org/pep-0020/. The main reason for this FLIP is that the current Python API for Flink can be challenging for users accustomed to idiomatic Python libraries used for data transformations. With the number of Python downloads on PyPi reaching into the millions/week, the time is right to invest in the Python API to make it more Pythonic. This proposal aims to solve several key problems:

...

By addressing these issues, we aim to make PyFlink as intuitive and powerful for Python developers as other leading data processing frameworks, thereby improving adoption and developer productivity.

Public Interfaces

Below is a summary of the key changes to public interfaces.

New Interfaces:

  • Convenience Methods: Introduction of user-friendly methods on the Table object for data preview, such as .show() and .display(), similar to those in other data-frame libraries.
  • String Expression Class: A new string class will be added to expressions to provide familiar methods like .str.upper() instead of upper_case().
  • Type Hinting: Comprehensive type hints will be added across the public API (e.g., Table, TableEnvironment) to improve IDE support for autocompletion and static error checking.
  • Python-Native Types: Support using standard Python types (e.g., int, str) in function signatures for UDFs, which will be automatically converted to Flink's DataTypes.
  • Migrate from Builder Pattern: Where possible, move from builder patterns to dataclasses, constructors, factory functions and context/configuration patterns.

Changed Interfaces:

  • Method to Attribute Conversion: Getter methods will be converted to properties where appropriate to follow Python conventions (e.g., using table.schema instead of table.get_schema()).
  • Execution Consistency: The API for job submission will be unified to provide a consistent experience for both local (.wait() will no longer be required) and remote execution.

Removed Interfaces:

Currently there are no expected interfaces that will be removed, but as the work evolves there may be some required changes to determined non-pythonic areas, but the best effort will be made to mirror or maintain an escape hatch.

Proposed Changes

...

Task Name/FLIP/Issue

Description
Reference table columns as attributes

...

Allow pandas-like table.<my-col> reference in addition to col(“<my-col>”)
for all table API arguments

...

Kwargs aliasing

...

Allow polars-like table.agg(a_sum=<expr>) in addition to
table.select(<expr>.alias(“a_sum”) for providing named aliases via kwargs

...

Move getter methods to attributes where possible

...

Convert getter methods to properties where possible for more Pythonic
access.

...

Using Python types as well as or instead of DataTypes

...

Allow users to specify Python types in function signatures, which are
converted into Flink Types.

...

Move from Builder pattern to Python friendly patterns

...

Where possible move from the builder pattern that leaks from Java to
Python-native patterns like dataclasses, constructors, factory functions
and context/configuration patterns.

...

String methods

...

Add

...

a string class to expressions with methods similar to Python and pandas.

...

Unraveling/Truncating Tracebacks

...

Capture and simplify JVM stack traces, showing only relevant information to
the Python user.


Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  •  
    • Improvement of existing API surface area.
    • All functionality/API in use will continue to exist to minimize friction of adoption.
  • If we are changing behavior how will we phase out the older behavior?
  •  
    • If there are issues preventing backwards compatibility we will make a migration guide available in the documentation.
  • If we need special migration tools, describe them here.
    • See above
  • When will we remove the existing behavior?
    • There are no plans to remove existing behavior, only additions.

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

All new surface area will be tested where possible.

Rejected Alternatives

N.AIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.