Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There is no separate ZK dependency for the client and the server
  • Makes it easy to start ZK in "managed" mode (see below)Will we shade ZK away?, because we will ship ZK with Flink

JM/TM/Client configuration

  • Add HA mode for JMs (allowing non-HA and HA JM operation)
    • HA mode when a ZooKeeper quorum is configured
    • zookeeper.quorum: "host:port,[...,]host:port"
  • TM, clients need to:
    • connect directly to configured JM (non-HA)
    • connect to JM leader; ZK coordinates the JM address (HA)
  • Note: as of current stable ZK version (3.4.6) the ZK quorum needs to be statically configured (see ZK-107)

...

  • ZK client configuration in Flink only requires the address of the ZK quorum to connect to
  • Znode root and znode paths relative to this root, e.g. /flink (root), and /flink/leader

ZK server configuration

  1. ZK managed by Flink (default): Flink startup scripts provides script to start ZK servers for HA mode
    zookeeper.mode: "managed" (default)
    zookeeper.dir: "/flink" (default)
    zookeeper.quorum: "host:port,[...,]host:port"
    zookeeper.property.<zk-property>: ... (set single ZK properties)
    zookeeper.conf=path-to-zk-conf (similar to current Hadoop conf; has precedence over Flink conf)
    1. Configuration via zoo.cfg
  2. Dedicated ZK cluster (user sets up ZK)
    zookeeper.mode: "dedicated"
    zookeeper.quorum: "host:port,[...,]host:port"zookeeper.dir: "/flink" (default)

...

Startup scripts

...

  • Add an optional file `backup-masters` or `masters``bin/start-cluster-streaming.sh` takes extra argument for backup masters to start`masters` to use for HA mode (contains all masters)
  • No extra startup scripts, HA mode is activated if quorum is configured
    • scripts start JobManager on hosts configured in `masters`

Leader Election

  • TaskManager and Client retrieve leading JobManager using ZooKeeper

  • Every time they lose connection to the current JobManager, they try to reconnect to the current leader using ZooKeeper

...