Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Changed according feedback

...

Contents:

Table of Contents

 

(Partition Map) Exchange

PME - is process of exchange partition information between nodes. Process goal is to setup actual state of partitions for all cluster nodes.

...

Node 2 does not throw data and does not change partition state even it is not owning partition by affinity on new topology version.

 

Step 3. Node 4 will starts to issue demand requests for cache data to any node which have partition data.

The full-map exchange most likely will be completed before node 4 rebalances all the data it will own (with node 4 partition 3 state - Moving),


Step 4. When all data is loaded (last key is mapped) Node 4 locally changes state to Owning (other nodes think it is Moving).

Node schedules a background cluster notification.


Step 5. Node 4 after some delay (timer) sends single message to Coordinator (this message may This message will have absent/'null' version of exchange).

 

Step 6. Coordinator sends updated full map to other nodes.

...

All nodes eventually get renting state for P3 backup for Node 4.

 

Info

SingleMessage with exchId=null is sent when a node updates local partitions' state and schedules a background cluster notification.

In contrast, when a partition map exchange happends, it is completed with exchId != null.

 

If more topology updates occurred, this causes more cluster nodes will be responsible for same partition at particular moment.

Clear Clearing of cache partition data immediately is not possible because of SQL indexes. These indexes covers all node's partitions globally. Elements removed one by one. Later state will be Evicted. 

Exchange types

 

  • synchronous: Come from discovery events (real & custom): have to wait transactions to finish
  • asynchronous, does not slowdown operations

...

Synchronous exchange is required for primary partition rebalancing to new node.

If such Let's suppose for a second, such partition map exchange is not sync synchronous: There is possible case that 2 primaries exists in Owning state. New node changed state locally, old node does not received update yet. In that case lock requests may be sent to different nodes (see also Transaction section), it could brake ACID.

As result, primary migration requires global synchronous exchange. It should not be running in the same time with transactions.

For protection there is Topology RW ReadWrite Lock

  • transactions & atomic cache updates - acquire read lock - N locks supported in the same time
  • exchange acquire write lock - 1 lock excluding transactions is possible

Using ReadWrite lock RW gives semantic N transactions, or 1 exchange.

...