Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

More complex streaming pipelines generally need to keep some sort of operator state in order to execute the application logic. Examples include, keeping some aggregation or summary of the received elements, or we can imagine more complex states such as keeping a state-machine for detecting patterns for fraudulent financial transactions or holding a model for some machine learning application. While in all mentioned cases we keep some kind of summary of the input history, the concrete requirements vary greatly from one stateful application to another.

This document is intended to serve as a guide for fault-tolerant stateful stream processing in Flink and other streaming systems, by identifying some common usage patterns and requirements for implementing stateful operators in streaming systems. 

Fault-tolerance in Flink

While this is not the focus of this document, it is important to introduce the basic mechanism behind fault-tolerance in Flink streaming. The algorithm used by Flink is designed to support exactly-once guarantees for stateful streaming programs (regardless of the actual state representation). The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once.

To achieve this functionality Flink periodically draws consistent global snapshots of the execution topology in a highly efficient manner. You can read more about the checkpointing mechanism in the documentation.

State access patterns

To design a properly functioning, scalable state representation we need to understand the different state access and handling patterns that applications need in streaming programs. We can distinguish three main state types (access patterns): local state, partitioned state, out-of-core state. 

...