Page properties | |||||
---|---|---|---|---|---|
|
Status
Current state: Accepted
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html
...
|
...
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
This requires the parser support of Calcite. Fortunately, Calcite v1.25.0 has already supports tumbling and hopping windowing TVF. We need to extend it to support one more window: cumulative window.
Windowing table-valued functions
We would like to propose 4 kinds of window TVF: tumbling, hopping, cumulative, session. The return value of window TVF is a relation that includes all columns of data as well as additional 3 columns named window_start, window_end, window_time
to indicate the assigned window. The window_time
field is a time attribute of the record after window TVF, it alwsays equal to support one more window: cumulative window.
Windowing table-valued functions
"window_end - 1"
.
Tumbling Windows
The table-valued function TUMBLE
assigns a window for each row of a relation based on a time attribute column. The return value of TUMBLE
is a relation that includes all columns of data as well as additional two columns named 3 columns named window_start
and , window_end
and , window_time
which to indicate the assigned window. The window_time
column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1"
. The The original row time attribute "timecol" will be a regular timestamp column after window TVF. All assigned windows have the same length, and that’s why tumbling sometimes is named as “fixed windowing”.
TUMBLE
that takes three required parameters and one optional parameter:
Code Block | ||
---|---|---|
| ||
TUMBLE(data, DESCRIPTOR(timecol), size [, offset ]) |
...
The table-valued function HOP
assigns windows that cover rows within the interval of size and shifting every slide based on a timestamp column. The return value of HOP
is is a relation that includes all columns of data as well as additional two columns named additional 3 columns named window_start
and window_end
and window_time
which indicate the assigned window. The window_time
column is the time attribute of the record after window TVF,
it alwsays equal to " window_end
- 1". The , window_time
to indicate the assigned window. The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping so hopping sometime is named as “sliding windowing”.
...
The table-valued function CUMULATE
assigns windows that cover rows within an initial interval of step size, and expanding to one more step size (keep window start fixed) every step until to the max window size. The return value of CUMULATE
is a relation that includes all columns of data as well as additional two columns named 3 columns named window_start
and , window_end
and , window_
time which time
to indicate the assigned window. The window_time
column is the time attribute of the record after window TVF, it alwsays equal to "window_end - 1"
. The original row time attribute "timecol" will be a regular timestamp column after window TVF. Windows assigned could have overlapping.
...
The table-valued function SESSION
assigns windows that cover rows based on datetime. Within a session window, distances of rows are less than interval. Session window is applied per key. The returned return value of SESSION
SE
is a relation that includes all columns of data as well as additional three columns named additional 3 columns named window_start
and , window_end
and , window_
time which time
to indicate the assigned window. The window_time
column is the time attribute of the record after window TVF, it alwsays equal to " window_end - 1". The original row time attribute "timecol" will be a regular timestamp column after window TVF.
...
SELECT ... FROM L [LEFT|RIGHT|FULL OUTER] JOIN R -- L and R are relations applied windowing TVF ON L.window_start = R.window_start AND L.window_end = R.window_end AND ...
...
Explore more ability of Polymorphic Table Functions
It's very exciting to see the potential ability of PTF. As we can see, in the future, maybe we can support the following features in SQL in a standard PTF way.
- advanced operations supported in Table API (FLIP-29), e.g. drop_columns, user-defined-table-aggregate
- user defined join operator
- a shortcut TopN function
- re-assign watermarks?
- re-partition data, similar feature to Hive DISTRIBUTED BY syntax.
- ...