ID | IEP-15 |
Author | @Semen Boikov |
Sponsor | Yakov Zhdanov |
Created | |
Status | DRAFT |
The current implementation of Discovery SPI implies association of the cluster nodes in the topology of type "ring".
This decision, along with a host of advantages, has a fundamental drawback - the passage of messages is directly proportional to the number of nodes in the cluster.
Consequently, handling of events: join node, node leaving, detection of an occurrence of split-brain situations depends on cluster size. That is, this implementation is not optimal for large configurations.
Any implementation of Discovery SPI subsystem must provide:
a connection of a new node in the topology
node Disconnecting from the topology
of the order of grease retention, in which nodes are connected to the cluster and disconnect from it
perhaps through a subsystem send custom messages
Ability to save node attributes
Ability to establish authenticator to be validated connected nodes
Znode - Any node or ephemeral node.
Node - A node in a hierarchical structure that stores user data in the ZooKeeper cluster. A node exists until explicitly deleted.
Ephemeral node - Similarly, Node, but the node exists until explicitly deleted or until essentially created his session.
Watches - Customers sign up to receive notifications of changes to the Node.
ZooKeeper server - node is running Zookeeper
ZooKeeper client - host running the server node Apache Ignite
service / ZooKeeper cluster ZooKeeper- the union of server node ZooKeeper, providing service ZooKeeper
/discoveryEvents - znode storage discovery stage events in the cluster Apache Ignite
/alive - znode storage a list of all nodes in the topology
/joinData - znode storage Apache Ignite server node data at the time of connection
/customEvents - znode to store user messages
In order to eliminate these drawbacks is proposed implementation using ZooKeeper-a service. As a point of connection and synchronization for all server nodes of Apache Ignite cluster is a ZooKeeper .
Thus ZooKeeper cluster is to guarantee the storage of the current topology, attribute nodes, the next connection nodes, queues for storing user and service events.
ZooKeeper main functionality is to provide for the allocation process through a shared hierarchy, service coordination, and synchronization.
At the same time, it provided a guarantee that:
All the changes will apply from users while maintaining the order in which they were received by the cluster.
Actions client nodes zooKeeper performs transaction, ie, or all will succeed, or all will be lost.
All customers Zookeeper "see" the same state of the data in the service, regardless of to what exactly the server they are connected to ZooKeeper.
After receiving confirmation that the data have been obtained, they will be present in service until removal.
To provide these guarantees all ZooKeeper servers providing service, combined in a cluster and synchronize the current status of connected clients and their transactions using Zab protocol (https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+in+words). As long as the majority of ZooKeeper cluster servers "see each other" service is available (which is why in the cluster ZooKeeper must be an odd number of servers). ZooKeeper clients connect to an arbitrary server ZooKeeper on a TCP connection, send requests, receive responses, and notification information that the service is functioning correctly. If the connection is dropped, the client ZooKeeper node can reconnect to an arbitrary server ZooKeeper.
Another important functionality ZooKeeper-and technology is customer notifications on changes. ZooKeeper can store arbitrary, customer information directly to the service. After saving the recorded information can be available to all customers ZooKeeper. For this ZooKeeper customers can create any tree structure (similar to the structure of directories and files in the file system), each node which is ZooKeeper znode.Each znode allows you to store an arbitrary user information. Znode divided into node and ephemeral node.If the node was created as a node,it will exist until explicitly deleted. If the node was created as an ephemeral node it will exist until explicitly deleted or until long as there has created its client session.
Any client ZooKeeper may be asked to send a one-time or recurring notification when updates to certain znode.For example, when the changed data associated with this znode or znode ceased to exist, or at a given znode created new ( "daughter") znode.
Zookeeper guarantees the consistency in message handling, but for Apache Ignite is necessary to guarantee the preservation of the sequence processing of events, and at their own level. To do this in Apache Ignite cluster remains the coordinator, whose role performs the very first node (from active now) are connected to the service ZooKeeper.
The configuration of each server node Apache Ignite specified range of addresses and ports for connection to ZooKeeper cluster.
Each server Apache Ignite node is connected to an arbitrary server ZooKeeper.
Creates a new entry in the / joinData data connection.
Recorded in the ephemeral znode / alive - the list of all active nodes in the topology.
It checks whether it is a focal point.
Subscribes to notifications on changes in / discoveryEvents
subscribes to receive notification of a registered output of the previous node from /alive
coordinator, having received notification that a / alive, a new node
processes data from / joinData, checking that the node may be included in the topology.
Creates event nodeJoin and writes it to / discoveryEvents
nodes topology, received notice that / discoveryEvents, a new message is processed topology change.
Coordinator, received notice that from / alive missing node creates event NodeFailEvent and writes it to / discoveryEvents
nodes topology, received notice that / discoveryEvents, a new message is processed topology change.
Node, connected for the disabled, subscribe to receive notifications of withdrawal of the previous registered site / alive.
When disconnecting the coordinator, the node connected to the second cluster, is notified of the registered output of the previous node from / alive
becomes the coordinator, changing topology version
Processes unprocessed messages / discoveryEvents, to synchronize the global and local topology.
Processes the raw messages / customEvents
sign up to receive notifications of changes to the / alive
sign up to receive notifications of changes in / customEvents
Discovery the SPI interface allows the user to transmit messages through Discovery subsystem. In the current implementation for the transmission of user messages used znode / customEvent. The basic algorithm implementation is as follows.
Node initiating customEvent sending messages, publish it in znode / customEvent.
Coordinator receives notification of a new message in / customEvents
Subject to the processing order forms new message / discoveryEvents
nodes topology, received notice that / discoveryEvents there is a new message, a new message is treated.
It should be noted that the mechanism customEvent implemented informing nodes of the early formation of a snapshot. This implies a modification of the message sent by topology. In the current implementation of the communication are immutable, which will require improvements in the mechanism of formation of snapshots.
TBD
TBD
TBD