This enhancement proposes an improvement to the current behavior of Window Evictor, by providing more control on how the elements are to be evicted from the Window. Original Design Document of this proposal can be found here
Current state: Released
Released: Flink 1.2
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Right now, the ability of Window Evictor is limited
- The Evictor is called only before the WindowFunction. (There can be use cases where the elements have to be evicted after the WindowFunction is applied)
- Elements are evicted only from the beginning of the Window. (There can be cases where we need to allow eviction of elements from anywhere within in the Window as per the eviction logic that user wish to implement)
The current interface of Evictor is this:
The Evictor gets an Iterable for all the window elements and the number of elements in the window. Then it can return a number that specifies how many elements should be evicted, starting from the beginning of the window buffer.
- Changes to the Evictor interface : addition of two new methods evictBefore and evictAfter, removal of the existing evict method
- Corresponding changes to CountEvictor, DeltaEvictor
- New class TimestampedValue to store records with a timestamp. This class is exposed to the users in the evictBefore and evictAfter methods.
- New overloaded method CountEvictor.of() that takes an additional parameter doEvictAfter, which decides whether to do eviction after the WindowFunction.
- Similar overloaded methods in DeltaEvictor and TimeEvictor
We propose to change the interface of Evictor to this
The Evictor has two methods, evictBefore - called before the WindowFunction - and evictAfter - called after the WindowFunction. These methods receive an Iterable of all the elements in the pane and the number of elements in the pane. Evictor can choose to remove elements from the Iterable based on any condition that a user wishes to implement. The EvictorContext is an interface very similar to TriggerContext but for Evictors, I’m omitting it for brevity.
This model allows to express everything that was possible with the current model but also
Allows eviction of elements from the “middle” of the buffer.
Allows eviction to be done before and/or after the WindowFunction.
CountEvictor, DeltaEvictor, and TimeEvictor by default will evict elements before the window function.
DeltaEvictor will iterate through the entire elements and evict if the element has higher delta than the threshold. TimeEvictor will evict all the elements that have a timestamp <= the evictCutoff. CountEvictor will only iterate through the elements till the count of elements in the window is reduced to specified size. CountEvictor will do the iteration only till the count of elements in the window is reduced to specified size.
Overloaded method in DeltaEvictor to make the Evictor evict after the windowFunction:
Similar overloaded methods are added to CountEvictor and TimeEvictor.
Compatibility, Deprecation, and Migration Plan
- Users currently using DeltaEvictor will have to be aware that the new implementation will apply the delta function to all the elements in the window(not till it finds the first element in the pane)
- Users who have currently implemented a custom Evictor will have to adapt to the new interface of the Evictor.
- CountEvictor will behave the same even after these changes.
- By default, all Evictors will evict before the WindowFunction, which matches the existing behavior.