Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    •  The whole rebalance process is idempotent in case it fails or the node crashses in the middle:
      • In metadata transaction a,  since (1) only the new node group creation is persistent metadata write, (2) we use uuid to name a node group if the desired name exists, and (3) step 3 drops leftover files for the rebalance target, it is idempotent if failure or crash happens at any point.
      • In metadata transaction b and c, since they sustain InterruptedException, they will never be interrupted in the middle as long as the node doesn't crash. In this way, a user cancellation request will not waste the bulk part of the rebalance work, i.e., metadata transaction a.
      • In metadata transaction b and c, if the node or JVM crashes, 
        • if the metadata entity was not switched (depending on what's there in the metadata node), the system uses the rebalance source as foo;
        • if the metadata entity was switched, the system uses the rebalance target as foo;
      • In the event of failures, there could be leaked source files (from metadata transaction ac) which will be reclaimed in the next rebalance operation, or leaked target files (from metadata transaction ba) which will not be reclaimed,  or leaked node group name (from metadata transaction a) which doesn't prevent the success of the next rebalance operation.  (ASTERIXDB-1948)