This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There is no separate ZK dependency for the client and the server
  • Makes it easy to start ZK in "managed" mode (see below)Will we shade ZK away?, because we will ship ZK with Flink

JM/TM/Client configuration

  • Add HA mode for JMs (allowing non-HA and HA JM operation)
    • HA mode when a ZooKeeper quorum is configured
    • zookeeper.quorum: "host:port,[...,]host:port"
  • TM, clients need to:
    • connect directly to configured JM (non-HA)
    • connect to JM leader; ZK coordinates the JM address (HA)
  • Note: as of current stable ZK version (3.4.6) the ZK quorum needs to be statically configured (see ZK-107)

...

  • ZK client configuration in Flink only requires the address of the ZK quorum to connect to
  • Znode root and znode paths relative to this root, e.g. /flink (root), and /flink/leader

ZK server configuration

  1. ZK managed by Flink (default): Flink startup scripts provides script to start ZK servers for HA mode
    zookeeper.mode: "managed" (default)
    zookeeper.dir: "/flink" (default)
    zookeeper.quorum: "host:port,[...,]host:port"
    zookeeper.property.<zk-property>: ... (set single ZK properties)
    zookeeper.conf=path-to-zk-conf (similar to current Hadoop conf; has precedence over Flink conf)
    1. Configuration via zoo.cfg
  2. Dedicated ZK cluster (user sets up ZK)zookeeper.mode: "dedicated"
    zookeeper.quorum: "host:port,[...,]host:port"zookeeper.dir: "/flink" (default)

Startup scripts

  • Startup scripts need to provided unique ID for zookeeper.dir znode to use for the JM, TM, instances in order to allow multiple Flink clusters to run with the same ZK installation, e.g. coordination happens in zookeeper.dir/ID/ for each started cluster
  • Add an optional file `backup-masters` or `masters``bin/start-cluster-streaming.sh` takes extra argument for backup masters to start`masters` to use for HA mode (contains all masters)
  • No extra startup scripts, HA mode is activated if quorum is configured
    • scripts start JobManager on hosts configured in `masters`

Leader Election

  • TaskManager and Client retrieve leading JobManager using ZooKeeper

  • Every time they lose connection to the current JobManager, they try to reconnect to the current leader using ZooKeeper

...