This page summarize the design of the stateful window operator, related to SAMZA-552. This operator is primarily used for stream-stream join, in which streams are windowed (we do not support infinite window in Samza).
As described in this document, there are two classes extending the WindowOperator interface: AggregatedWindowOperator and FullStateWindowOperator. The difference between these two operators is that for AggregatedWindowOperator you cannot access to its windowed messages, whereas for FullStateWindowOperator you can. This feature of FullStateWindowOperator is necessary for operations such as joins, etc.
The classes are define as the following:
Here are a list of example window operations and how they can be implemented.
The following aggregation on stream Orders:
Can be implemented as follows:
First note that unbounded stream-stream joins are not supported, i.e. join predicates must include timestamps from both streams. For example, the following stream-stream join will be rejected at parsing time.
The following join on stream Orders and Shipments aligns timestamp boundary from the two streams:
To support this, Samza first will have a StreamStreamJoinOperator implementation:
Then user's code would look like sth. like:
Here are some more notes regarding the above APIs:
- We can put the beforeSend / beforeProcess into a Callback class to get better code re-usage.