Cluster activation feature is very handy when working together with new persistence feature introduced in Apache Ignite 2.1.
However it has its drawbacks:
It is possible to address these items by introducing new concept of BaselineTopology (working name; other options: RestartNodeSet, FixedNodeSet, MinimalNodeSet). The main idea is to have a fixed set of nodes that are expected to be in the cluster.
BaselineTopology (BLT) is created and changed by command from user (or automatically in special cases, see Use Cases section). Nodes joining or leaving cluster don't affect BLT.
For example, user may create a cluster of ten nodes and active it. On activation BLT is established, partitions are assigned to nodes based on BLT and not on actual topology.
When nodes join or leave cluster, no rebalancing happens because BLT stays the same.
In current activation implementation new node joining cluster automatically becomes active as well. With BLT we will end up in situation when there are two kind of "active" nodes in the cluster: active in BLT (have partitions assigned to them and stores actual data) and active out of BLT (don't have any partitions).
To have "active" status consistent I propose to introduce new status with working name "active_empty" which means that node in this status has all components started and initialized, but is not visible to affinity function and thus doesn't store any data.
BLT needs to be versioned to address the issue with different histories of the same data (see last example in Preconditions section). When old node with a BLT tries to join an existing cluster, version of its BLT and current BaselineTopology are compared. If they mismatch cluster refuses to join such node to avoid data corruption and undefined behavior.