You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

IDIEP-4
Author

Alexey Goncharuk

Sergey Puchnin

SponsorAlexey Goncharuk
Created2017-10-16
StatusDRAFT


Motivation

When the persistent storage was introduced for Apache Ignite, cluster membership mechanism was not changed, which means that every node startup or shutdown is treated as a cluster member change. This leads to several performance, usability and consistency issues:

  • RebalanceDelay configuration property is not flexible enough to protect cluster from unnecessary rebalancing
  • Cluster startup in inactive mode requires user intervention
  • A separate class of cluster activation scenarios may result in data inconsistency
  • Lost partition recovery is possible only with user intervention

Other persistence-enabled clusters, such as Cassandra, require a manual acknowledgment when adding or removing nodes. If a node fails, it is considered temporarily offline, and no data movement starts until an administrator confirms that the node is offline permanently.

Description

To resolve the issues described above, we introduce a concept of affinity baseline topology: a target set of nodes intended to keep data for persistence-enabled caches. We will also attach a list of hashes to each baseline topology generated on branching points that will allow us to keep track of activation history and prevent data divergence in the cluster. A branching point is a cluster action which may affect data integrity of the cluster. One of the examples of branching points is cluster activation.

Baseline topology and branching history must be stored in a reliable fail-safe metadata storage which should be available for reading and updating upon node join. This is needed to detect branch divergence and prevent stale nodes from joining the cluster. Each baseline topology change and branching action is saved in the metadata storage.

Affinity for persistence-enabled caches is calculated using the baseline topology and then offline nodes are subtracted from the mapping. When a new node joins the cluster or a node goes offline, the baseline topology does not change, only affinity mapping is corrected with regard to offline nodes. Since affinity must be calculated for offline nodes, the cluster must be able to create 'phantom' topology nodes and pass them to an affinity function. Since affinity function may use arbitrary node attributes, we will introduce an interface that declares which node attributes are used in affinity calculation. Required node attributes will be stored in the metadata storage.

Baseline topology for a newly created cluster (or an old cluster with old persistence files) is created during the first cluster activation. The following baseline topology changes should be either confirmed manually or approved automatically via a user-defined pluggable interface.  When a cluster with existing baseline topology is started, it waits for all nodes in the baseline topology to go online. Once all the nodes are started, the cluster can perform auto-activation.

Functionality that is not related to data affinity (compute grid, services) is not affected by baseline topology.

Usability considerations

It's necessary to add an ability to manage baseline topology to both command-line utility set (visorcmd, control.sh script) and consider adding it to the WebConsole. 

Risks and Assumptions

Additional information in the metadata storage increases disk space consumption. Since we store additional information about caches, we introduce another point of configuration validation. Metadata storage introduction may require additional changes to PageMemory because metadata storage may be required during node join (currently, the memory recovery happens after discovery start).

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Cluster-auto-activation-design-proposal-td20295.html

http://apache-ignite-developers.2346864.n4.nabble.com/Design-proposal-automatic-creation-of-BaselineTopology-td20756.html

Reference Links

// Links to various reference documents, if applicable.

Tickets

Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels