DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
| ID | IEP-140 |
| Author | |
| Sponsor | |
| Created |
|
| Status | DRAFT |
Apache Ignite provides a good horizontal scalability enabling applications built on top of it to store more data and handle more user requests as applications grow. And at some point support for deployment of an application to more than one datacenter is required to ensure disaster resilence and recovery.
Right now Ignite doesn't support 2+ DC deployments effectively though it is not prohibited even in current versions. The reason for that are internal components that are not optimized to perform their functions in MultiDC environment. For instance SQL engine could requiest pieces of data from remote DC even though original request could be fulfilled from a local DC. This leads to lower throughput and higher latency of user operations as well as decreased stability of the cluster (at least for ring-shaped cluster topology used by TcpDiscoverySpi).
Right now Ignite 2.x doesn't support MDC deployments in an efficient way and this IEP describes improvements needed to close this gap in functionality.
Support for MDC isn't a separate big feature but rather a set of improvements of existing components allowing Ignite cluster to reach the following set of goals:
The first improvement is related to Discovery SPI implementation and aims at minimizing cross-DC edges in ring topology thus increasing cluster stability. It enables the subsequent set of improvements.
Improvements of all other components like KV API, SQL, rebalancing and so on could be done in parallel in one or more phases after Discovery SPI improvement is implemented. All necessary changes for each component will be covered in subsequent sections of this document.
On top of modified Discovery SPI could be built more effective mechanisms for detecting and mitigating split-brain situations.
Introduce new attributes to label Ignite nodes related to different DCs, develop components necessary for operations in MultiDC environment (TopologyValidator, SegmentationResolver with suitable segmentation policy, tooling for managing stretched cluster).
Modification of TcpDiscoverySpi is needed to fulfill the following requirements:
At the same time any improvements to TcpDiscoverySpi should not affect single-DC installations (no additional complexity of configuration or performance penalty is introduced).
The following list of components is not exhaustive, other components may be added to the list later with detailed plan of their optimization for MDC deployment.
At the moment Apache Ignite provides two interfaces to handle split-brain situations: TopologyValidator and ConflictResolver.
However proposed improvements may enable more efficient algorithms for detecting and handling split-brain, e.g. fast detecting of a split-brain on the edges of DCs or simpler and more efficient TopologyValidator implementations based on the notion of "Leader/Follower DC". However all these ideas are a matter of discussion and clarification.
Thin client has partition-awareness feature. With this feature client connects to all cluster server nodes and tries to find correct node for key and send request to this node (see IEP-23: Partition Awareness for Thin Clients). To achive this client requests primary node map for cache from server. Non partition-awareness requests (not related to cache requests, or without specified key, which can be mapped to node) distributed randomly to all nodes.
In case of multi DC we can make following improvements to PA mechanism:
It's proposed to change "cache partitions" messages (add information about partition in current DC) and introduce new "data center nodes" messages to achive these improvements. Alternatevely, instead of introducing new message we can reuse CLUSTER_GROUP_GET_NODE_IDS message, but in this case client should know about server attribute name to store DC ID (which is server internal information), and everything can be broken if server will change DC ID storage place.
The new operation for "data center nodes" request is required:
| Name | Code |
|---|---|
| OP_CLUSTER_GET_DC_NODES | 5103 |
| Request | |
|---|---|
| String | Data center ID |
| Response | |
|---|---|
| int | Nodes count |
| (UUID) * count | Nodes IDs |
| Request | |
|---|---|
| bool | Custom mapping |
| String | Data center ID - new field |
| int | Caches count |
| (int) * count | Cache IDs |
| Response | |
|---|---|
| long | Major topology version |
| int | Minor topology version |
| int | Partition mappings count |
| (Caches configuration and partition mapping) * count | Caches configuration and partition mappings |
| Caches configuration and partition mapping | |
|---|---|
| bool | Applicable. Flag that shows, whether standard affinity is used for caches. |
| int | Count of caches |
| (Cache ID + key configuration) * count | Cache IDS and cache key configurations |
| Partition mapping | Primary partition mapping (if Applicable is true) |
| Partition mapping | Current DC partition mapping (if Applicable is true) - new field |
| Partition mapping | |
|---|---|
| int | Nodes count |
| (Node ipartitions) * count | Node partitions information |
| Node partitions | |
|---|---|
| UUID | Node ID |
| int | Partitions count |
| (int) * count | Partitions |
Current approach to this IEP introduces components' mostly internal logic modifications, no public API changes or breaking binary compatibility are needed. Protocols stay the same as well with some internal tweaks and possibly some refactoring.
[Will be added later]