Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Local (non-partitioned) state

Image RemovedImage Added

Local or non-partitioned state is the simplest form of operator state which represent the current state of a specific operator instance in a parallel streaming operator. Local states stored at different operator instances do not interact with each other. For instance if we have a mapper with parallelism of 10 that means each parallel instance holds it’s own local state.  An important thing to note here is that state updates reflected in a particular local state will depend only on the input of that specific operator instance. As there is no strict control over the input which each operator instance processes (only via stream partitioning), the operator itself needs to deal with this non-deterministic nature of the inputs and this also limits expressivity.

 Typical usage of non-partitioned state includes source operators (storing offsets), oany global aggregation/summary or analysis operators where the local results will be merged into a global result afterwards.

 Scaling out non-partitioned state can in most cases be done by just starting a new operator instance with a blank state, or a user supplied splitting function can be used to split the state of an existing instance. For reducing job parallelism the user should  generally provide a function that can merge 2 local states to maintain correct results.

...