This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KIP-450: Sliding Window Aggregations in the DSL

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Rather than adapt the existing TimeWindows interface (which provides semantics for tumbling and hopping windows), I propose to add a separate SlidingWindows class. This will resemble a stripped-down version of the TimeWindows class, and have only one the following public API (plus several methods in needs to override from the abstract base class Windows<W>):

Code Block
languagejava
public final class SlidingWindows extends Windows<TimeWindow> {    

	public static SlidingWindows of(final Duration size);

	public SlidingWindows grace(final Duration afterWindowEnd);

	@Override
	public Map<Long, TimeWindow> windowsFor(final long timestamp);

	@Override
    public long gracePeriodMs() { return 0;
}

	@Override
    public long size() { return sizeMs;}

	@Override
    public boolean equals(final Object o);
 
	@Override
    public int hashCode();

	@Override
	public String toString();
}

...

Compatibility, Deprecation, and Migration Plan

N/A

Rejected Alternatives

Operations & Semantics

In considering the semantics we have some flexibility in how/when to output the results of an aggregations. For example, rather than outputting only the final result after the window has left the grace period we might have wanted to send a result as soon as it closed, and then send further updates as any out of order data arrived. However realistically out of order data occurs often enough that it makes sense to not output a result right away, and rather wait for potential updates for some amount of time. Naturally the grace period would be a sensible choice for this time to wait so as not to flood the downstream with more updates than results. Outputting only the final result, rather than some potential result and a chance of some updates, is likely to be the more straightforward to deal with even if you may not see output immediately but until the grace period has passed.

...

Really, here we should just choose whichever option has is most discoverable for users.

Implementation