Child pages
  • Window Operator Design
Skip to end of metadata
Go to start of metadata

This page summarize the design of the stateful window operator, related to SAMZA-552. This operator is primarily used for stream-stream join, in which streams are windowed (we do not support infinite window in Samza). 

 

Class APIs

As described in this document, there are two classes extending the WindowOperator interface: AggregatedWindowOperator and FullStateWindowOperator. The difference between these two operators is that for AggregatedWindowOperator you cannot access to its windowed messages, whereas for FullStateWindowOperator you can. This feature of FullStateWindowOperator is necessary for operations such as joins, etc.

 

The classes are define as the following:

 

 

Case Studies

Here are a list of example window operations and how they can be implemented.

Windowed Aggregation

The following aggregation on stream Orders:

 

Can be implemented as follows:

 

Stream-Stream Join

First note that unbounded stream-stream joins are not supported, i.e. join predicates must include timestamps from both streams. For example, the following stream-stream join will be rejected at parsing time.

 

The following join on stream Orders and Shipments aligns timestamp boundary from the two streams:

 

To support this, Samza first will have a StreamStreamJoinOperator implementation:

 

Then user's code would look like sth. like:

 

Discussion

Here are some more notes regarding the above APIs:

  1. We can put the beforeSend / beforeProcess into a Callback class to get better code re-usage.

 

1 Comment

  1. Two points to correct in the last example user code:

    1. We don't need to call init() on each operator. Just call join.init() is enough
    2. We don't need to call orderWindows.process() in the process() method either. Calling join.process() takes care of all execution paths.
Write a comment…