Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Basic

...

Definitions

Node - separate individual instance of Ignite, either a server or a client.

Client node - a thick client used to connect to an Ignite cluster. It can't store any data and acts as a gateway to the cluster.

Server node - regular Ignite node used to store data and serve requests. "Node" and "server node" definitions are used interchangeably across this document.

Node order order - internal property of each node (in case of TcpDiscoverySpi it is just a uniformly increasing number)a node used to imply a total order over the nodes in a cluster.

Coordinator - specific (with minimal order) special server node responsible for coordinating different processes in the cluster (like e.g. verifying discovery messages , or managing Partition Map Exchange and others).

Topology - high-level structure all nodes are organized intoarrangement of nodes connected via a communication network.

Introduction

Discovery The main goal of the discovery mechanism is the most basic piece of functionality aimed to form a cluster from separate Ignite nodes. Its main goal is to build some view of the cluster (number of nodes, nodes' orders and so on) shared by all nodes and maintain consistency of that view.to create a topology of Ignite nodes and build and maintain a consistent in-memory view of it on every node. For example, this view may contain the number of nodes in the cluster and their orders.

Discovery mechanism is represented by the DiscoverySpi interface, while TcpDiscoverySpi is its default implementation. Other implementations, like ZookeeperDiscoverySpi, also exist but are out of the Discovery is hidden behind the interface DiscoverySpi, its default implementation is TcpDiscoverySpi. Another implementation is ZookeeperDiscoverySpi but it is out of scope of this article.

DiscoverySpi implementation defines a structure called topology all nodes - both servers and clients - are arranged into. Cluster with TcpDiscoverySpi uses Exact topology structure is defined by the particular DiscoverySpi implementations, for example, TcpDiscoverySpi defines a ring-shaped topology.

Ring-shaped topology

Info

When describing cluster topology, we are talking about a logical arrangement that only exists on the "discovery" level. For example, when querying data residing in caches, a cluster might use a different topology than described here.

Ring-Shaped Topology

TcpDiscoverySpi organizes all server nodes of a Tcp Discovery aligns all server nodes in the cluster into a ring-shaped structure when where each node maintains connection to its next and its previous maintains connection to the nodecan only send discovery messages to a single node (called a "neighbor"). Client nodes stay outside of the ring and are always connected to a specific server (there is no previous node for client, only a server it is connected to)single server. This logic is contained inside ServerImpl and ClientImpl classes for server and client nodes respectively.

Gliffy Diagram
size600
namering-shaped topology
pagePin2

Most of implementation logic lives in ServerImpl and ClientImpl classes for servers and clients respectively.

Later on "node" will be referring to "server node" for convenience.

Node Join Process

Overview

When a When new node starts, it tries to find an existing cluster by probing all addresses provided by the TcpDiscoveryIpFinder. If all addresses are unavailable, the node considers itself as a very first the only node, forms the a cluster from itself and becomes the coordinator of this cluster.If node manages to connect to one of addresses from IpFinder. Otherwise, it starts the join process.

Node join process

...

Overview

Node join consists of several phases involving different messages - join request, NodeAdded message, NodeAddFinished message. During the process new node is validated, configuration information exchange happens if validation was successful and topology information is updated on all nodes.When node joins successfully it is placed between previous last node and coordinator in the ring:

  1. Joining node sends a TcpDiscoveryJoinRequestMessage to a random node in the cluster, which gets forwarded to the coordinator.
  2. Coordinator places the new node in the ring between the last node and itself and propagates the topology changes by sending the TcpDiscoveryNodeAddedMessage across the ring.
  3. After TcpDiscoveryNodeAddedMessage has been received by all members of the cluster, TcpDiscoveryNodeAddFinishedMessage is sent to finalize the changes.

Gliffy Diagram
size600
namejoin node overview
pagePin6
version5

Join request message

Starting point for join process is joinTopology method presented in both ServerImpl and ClientImpl classes.

TcpDiscoveryJoinRequestMessage

A node begins the join process by calling ServerImpl#joinTopology (for server nodes) or ClientImpl#joinTopology (for client nodes), which in turn calls TcpDiscoverySpi#collectExchangeData to collect all necessary discovery data (e.g. cache configurations from GridCacheProcessor, see different GridComponent#collectJoiningNodeData implementations). This data gets packed into the At first node collects discovery data from all its components (i.g. it collects cache configurations from GridCacheProcessor). Central point here is TcpDiscoverySpi#collectExchangeData which calls GridComponent#collectJoiningNodeData on each component.
This disco data is packed into join request and sent to the clustercoordinator.

When JoinReq reaches the coordinator receives the request it validates the message and generates NodeAdded message if validation has passed (the TcpDiscoveryNodeAddedMessage, if a validation was successful (see ServerImpl.RingMessageWorker#processJoinRequestMessage). After that lifecycle of join request is finished; only NodeAdded message is used further. Coordinator creates this message, adds info about joining node (including joining node discovery data from JoinReq) to it and sends This message is then sent across the ring.

NodeAdded message

Processing logic lives in ServerImpl.RingMessageWorker#processNodeAddedMessage.

TcpDiscoveryNodeAddedMessage

When handling the TcpDiscoveryNodeAddedMessage, every node in the cluster applies On receiving NodeAdded each node in cluster (including coordinator node) applies joining node discovery data to components, collects its local discovery data, and adds it to the message .
Then node sends NodeAdded to the next node by calling (see ServerImpl.RingMessageWorker#processNodeAddedMessage for details). The message is then propagated further across the ring by calling ServerImpl.RingMessageWorker#sendMessageAcrossRing.

Lifecycle of NodeAdded is finished when it completes pass across ring and again reaches coordinator. Coordinator creates NodeAddFinished message and sends it to ringWhen TcpDiscoveryNodeAddedMessage completes the full circle and reaches the coordinator again, it gets consumed and the coordinator emits the TcpDiscoveryNodeAddFinishedMessage.

Info

NodeAdded message is TcpDiscoveryNodeAddedMessage is delivered to the joining node as well, it receives the message at the very end when after all other nodes have already processed it.

NodeAddFinished message

TcpDiscoveryNodeAddFinishedMessage

TcpDiscoveryNodeAddFinishedMessage finishes NodeAddFinished message as its name suggests finishes the process of node join. On When receiving this message, each node both server and client fires the NODE_JOINED event to notify the discovery manager about the new joined node.

Info
titleNodeAddFinished and additional join requests

Joining The joining node will send an additional join request if it doesn't receive NodeAddFinished receive TcpDiscoveryNodeAddFinishedMessage on time. This time is defined by TcpDiscoverySpi#networkTimeout and has a default value of 5 seconds (TcpDiscoverySpi#DFLT_NETWORK_TIMEOUT).

Detecting and

...

Removing Failed Nodes from Topology

Detecting failed nodes is a the responsibility of each server node in a cluster. However, detecting failed servers and clients works slightly different and will be considered separatelydifferently.

Server

...

Nodes Failures

As servers are organized into a ring each node can easily detect failure of its next neighbor node when if sending of next a discovery message fails.

Gliffy Diagram
namefail node overview
pagePin3

However, the process of removing a failed node from the ring (there is a special structure called "ring" on each server node which reflects topology view of current node, see ServerImpl#ring field) is not that simple and has to be managed by the coordinator. It contains two steps.

Gliffy Diagram
namefail node overview
pagePin3

:

First step

On During the first step, the server that detected that it next its neighbor has failed adds it to the special local map called failedNodes. Using this map it will filter failed nodes from all servers presented the broken node's address to a special in-memory structure called failedNodes (see TcpDiscoveryAbstractMessage#failedNodes). This structure is then used to filter out the failed nodes from the list of all nodes in the ring until the second step is completed for each failed node.complete.

After that, the server Along with adding next node to failedNodes map server picks up from the ring next server right after the failed and tries to establish connection with it. In case of success found server becomes new next, otherwise the process repeats from the beginning: adding node to failedNodes map, picking up next to failed and so ona connection to the next node in the ring. If the connection is established, this node becomes the new neighbor, otherwise the process gets repeated until a working node is found.

Info
titleTcpDiscoveryAbstractMessage and its responsibilities

Any disco message can inform All discovery messages contain information about failed nodes in topology the topology (see TcpDiscoveryAbstractMessage#failedNodes collection and logic around ServerImpl$RingMessageWorker#sendMessageAcrossRing) and all nodes receiving any disco that receive a discovery message start processing it from by updating their local failedNodes maps with info from that disco message (see ServerImpl$RingMessageWorker#processMessage where method processMessageFailedNodes is called almost at the beginning  (see ServerImpl$RingMessageWorker#processMessage).

When the ring is restored and the next alive server available node is found, the current node adds info about all nodes from failedNodes map to any discovery message it is about to send and sends it (see how this field is handled in ServerImpl$RingMessageWorker#sendMessageAcrossRing method).Also server detected failed node is responsible for creating TcpDiscoveryNodeFailedMessage - special discovery message that starts second step of removing failed node. One message is created for a failed node. So if three nodes has failed, three TcpDiscoveryNodeFailedMessage messages are createdsends a TcpDiscoveryNodeFailedMessage for every encountered failed node (e.g. if three nodes have failed, three TcpDiscoveryNodeFailedMessage messages are sent).

Second step

Second The second step starts on the coordinator. When the coordinator receives a TcpDiscoveryNodeFailedMessage it verifies it and sends it across the ring. All servers (including the coordinator) update their local topology versions, finally remove  remove the failed node[(s] ) from the ring structure and failedNodes map internal structures (see ServerImpl$RingMessageWorker#processNodeFailedMessage).

Client

...

Nodes Failures

Section patiently waits for someone to contribute contentthe content

Resilience in

...

Presence of Short Network Glitches

Algorithm of restoring ring without failing nodes (that one involving CrossRingMessageSendState and other tricky things and special timeouts) should be covered here in great details (including special timeouts)

Corner

...

Cases with

...

Filtering of

...

Unverified NodeFailed

...

Messages

Section patiently waits for someone to contribute contentthe content

Custom

...

Discovery Messages and

...

Events

Section patiently waits for someone to contribute the content

Message

...

Delivery Guarantees

Section patiently waits for someone to contribute the content