ID | IEP-72 |
Author | |
Sponsor | |
Created |
|
Status | DECLINED |
For Ignite 3.x the concept is formulated for the distributed table. The table is the base component that allows a store and updates data in the cluster. The table provides a base guaranty of consistency of data writes/reads.
All distributed table structures require to have redundancy of storing data to avoid losing entries when one (or more) member(s) of the structure goes down. Moreover, the data which is available should be consistent every time, while a part of the data available in the structure.
The atomic protocol should provide an ability to keep data redundancy level and keep data consistency until all the copies of the data are lost.
Table creation requires next parameters to be specified for the protocol purpose:
The familiar interface for atomic storage in a table is available through the Key-Value view[1] of a table:
public interface KeyValueView<K, V>
All batch methods won't have atomically guarantees and added for optimization on network communication.
It is an analogue of Ignite cache interface from Ignite 2.x.
Every partition replicas will be served by one RAFT group (it is implemented into IEP-61[2]). All synchronization guarantees between replies will be provided by RAFT protocol.
Since the RAFT elects a leader on its own, there is no difference between primary and backup replies - all the replicas are equal for the atomic protocol implementation.
All partitions should be distributed around the cluster as even as possible pursuing a balanced load goal. For this purpose, the Rendezvous affinity function will be used (the similar one is used in Ignite 2.x).
The function is calculated once for the cluster topology on one of the nodes and it's result (partition distribution) is stored to a Distributed metastorage, all the other nodes get precalculated the partition distribution and are able to use it local copy (no recalculation required) before the table will be available for operations.
A table entry consists of:
Table starts to create through public API. In the time a partition distribution is being calculated and will have been available into each node when the table is returned to the client code.
Every invocation of the table API determined a set of data entries which mapped to a partition by a key part of the entries. Distribution determines a RAFT group for specific partition, every partition update is transformed to the RAFT command and applied through RAFT group API.