Note this would likely have an impact on how we do suppression today: today the aggregation and suppression are totally independent operators, but with this optimization we may consider the suppression effects while implementing the aggregation operators to reduce unnecessary reads on the sub-windows.

The general algorithm that based on balanced trees (see references) are quite general, whereas in practice we can assume the out-of-ordering data does not have a large distance to the latest window boundaries. Thus, we can actually simply the academic algorithm such that (the following is just a wild thought, open for discussion):

If invertibility is enabled, cache the latest final aggregates from sub-windows and invert the partial aggregate from the earliest sub-window to expire when necessary; i.e. it is a simple two-layered tree.
If invertibility is not enabled, cache the latest final aggregate minus each sub-window that it includes, i.e. if the window has N sub-windows, we keep one partial aggregate value over N-1 sub-windows and one partial aggregate value over each individual sub-window, forming a three-layered tree.

References:

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	FLINK-7001

...

Space shortcuts

Child pages

Versions Compared

Old Version 8

New Version Current

Key

References:

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 8

New Version Current

Key

References: