...
Authors: Donal Evans
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
...
The proposed solution is to create a new RestoreRedundancyOperation
RestoreRedundancyBuilder
which will behave and be accessed in broadly the same way as the existing RebalanceOperation
. The operation RebalanceFactory
. When started, the builder will schedule PartitionedRegionRebalanceOp
operations for each appropriate region, but use a new RestoreRedundancyDirector
instead of the CompositeDirector
, which will only perform the removing over redundancy, restoring missing redundancy and (optionally) reassigning primary buckets steps.
Since the existing RebalanceResults
object does not capture the relevant information regarding actual redundancy, the RestoreRedundancyOperationRestoreRedundancyBuilder.getResultsstart()
method will return a new CompletableFuture
that will return a RestoreRedundancyResults
object containing the redundancy status for each region involved in the operation as well as information about primary bucket reassignments.
...
RestoreRedundancyBuilder createRestoreRedundancyBuilder()
Set<RestoreRedundancyOperation>Set<CompletableFuture<RestoreRedundancyResults>> getRestoreRedundancyOperations()
...
The RestoreRedundancyBuilder
will be responsible for setting the regions to be explicitly included or excluded in the RestoreRedundancyOperation
restore redundancy operation (default behaviour will be to include all regions), setting whether to reassign which members host primary buckets (default behaviour will be to reassign primaries) and starting the operation. A method will also be included for getting the current redundancy status:
public interface RestoreRedundancyBuilder {
RestoreRedundancyBuilder includeRegions(Set<String> regions);
RestoreRedundancyBuilder excludeRegions(Set<String> regions);
RestoreRedundancyBuilder doNotReassignPrimaries(boolean shouldNotReassign);
RestoreRedundancyOperation
CompletableFuture<RestoreRedundancyResults> start();
RestoreRedundancyResults redundancyStatus();
}
RestoreRedundancyOperation
The RestoreRedundancyOperation
interface will provide access to the current status of the operation, the ability to cancel it, and the ability to retrieve the results, with or without a timeout:
public interface RestoreRedundancyOperation {
boolean isCancelled();
boolean isDone();
boolean cancel();
RestoreRedundancyResults getResults();
RestoreRedundancyResults getResults(long timeout, TimeUnit unit);
}
RestoreRedundancyResults
The RestoreRedundancyResults
object will be a collection of individual results for each region and will contain methods for determining the overall success or failure of the operation, generating a detailed description of the state of the regions and for getting information about the work done to reassign primaries as part of the operation. The Status
returned by the RestoreRedundancyResults
will be FAILURE
if at least one bucket in one region has zero redundant copies (and that region is configured to have redundancy), ERROR
if the restore redundancy operation failed to start or encountered an exception and SUCCESS
otherwise:
public interface RestoreRedundancyResults {
enum Status {
SUCCESS,
FAILURE,
ERROR
}
void addRegionResults(Map<String, RestoreRedundancyRegionResult> resultsRestoreRedundancyResults results);
void addPrimaryReassignmentDetails(PartitionRebalanceInfo details);
void addRegionResult(RestoreRedundancyRegionResult regionResult);
Status getStatus();
String getMessage();
RestoreRedundancyRegionResult getRegionResult(String regionName);
Map<String, RestoreRedundancyRegionResult> getZeroRedundancyRegionResults();
Map<String, RestoreRedundancyRegionResult> getUnderRedundancyRegionResults();
Map<String, RestoreRedundancyRegionResult> getSatisfiedRedundancyRegionResults();
Map<String, RestoreRedundancyRegionResult> getRegionResults();
int getTotalPrimaryTransfersCompleted();
long getTotalPrimaryTransferTime();
}
...
The first command will execute a function on members hosting the specified partitioned regions and trigger the restore redundancy operation for those regions, then report the final redundancy status of those regions. If at
The command will return success status if:
- At least one redundant copy exists for every bucket in
...
- regions with redundancy configured that were included, either explicitly or implicitly.
- No partitioned regions were found and none were explicitly included.
The command will return error status if:
- At least one bucket in a region has zero redundant copies,
...
- and that region has redundancy configured.
- At least one of the explicitly included partitioned regions is not found.
- There is a member in the system with
...
- a version of Geode
...
- older than 1.13.0 (assuming that is the version in which this feature is implemented).
- The restore redundancy function encounters an exception
...
- .
The second command will determine the current redundancy status for the specified regions and report it to the user.
Both commands will take optional --include-region
and --exclude-region
arguments, similar to the existing rebalance command. If neither argument is specified, all regions will be included. Included regions will take precedence over excluded regions when both are specified. The restore redundancy command will also take an optional --dont-reassign-primaries
argument to determine if primaries should not be reassigned during the operation. The default behaviour will be to reassign primaries.
Both commands will output a list of regions with zero redundant copies first (unless they are configured to have zero copiesredundancy), then regions with less than their configured redundancy, then regions with full redundancy. The restore redundancy command will also output information about how many primaries were reassigned and how long that process took, similar to the existing rebalance command.
...
Since the proposed changes do not modify any existing behaviour, no performance impact is anticipated. Moreover, since restoring redundancy without performing a full rebalance is significantly less resource intensive, the addition of this API and these gfsh commands provides a more performant solution for cases in which only restoration of redundancy is wanted.
...
Any proposed solution to the problem that did not use the existing rebalance logic would have to reimplement large and complicated areas of code in order to correctly create redundant copies on members. One possible other solution that would use the existing rebalance logic would be to provide additional arguments to the existing rebalance operation to prevent moving buckets and prevent moving primaries. Given that the rebalance operation is already complicated, and that it could be confusing from a user perspective to use the name “rebalance” for an operation that is not actually balancing any data load, this solution was rejected in favour of creating a new, specific operation to restore redundancy.
Errata
...
RestoreRedundancyBuilder
should be renamed to RestoreRedundancyOperation
throughout, and the interface should now be:
public interface RestoreRedundancyOperation {
RestoreRedundancyOperation includeRegions(Set<String> regions);
RestoreRedundancyOperation excludeRegions(Set<String> regions);
RestoreRedundancyOperation shouldReassignPrimaries(boolean shouldReassign);
CompletableFuture<RestoreRedundancyResults> start();
RestoreRedundancyResults redundancyStatus();
}
The class name change results in more consistent class names and the method name change results in more easily understandable and consistent code.
...
References to RestoreRedundancyDirector
should be omitted.
This change reflects the fact that instead of introducing a new class, the existing CompositeDirector
was modified slightly to allow it to be used when restoring redundancy.
...
The new methods added to the ResourceManager
public interface should now be:
RestoreRedundancyOperation createRestoreRedundancyOperation()
Set<CompletableFuture<RestoreRedundancyResults>> getRestoreRedundancyFutures()
These changes reflect the new name of RestoreRedundancyOperation
and better describe what is returned by the "get" method.
...
The section describing the Status
enum should now read:
The Status
returned by the RestoreRedundancyResults
will be FAILURE
if at least one bucket in one region has fewer than the configured number of redundant copies, ERROR
if the restore redundancy operation failed to start or encountered an exception and SUCCESS
otherwise.
This change raises the threshold for what is considered a successful operation from one that results in any level of redundancy for all regions to one that results in fully satisfied redundancy for all regions.
...
The following methods should be omitted from the description of the RestoreRedundancyResults
interface:
void addRegionResults(RestoreRedundancyResults results);
void addPrimaryReassignmentDetails(PartitionRebalanceInfo details);
void addRegionResult(RestoreRedundancyRegionResult regionResult);
This change prevents leaking of internal classes through a public API and makes the RestoreRedundancyResults
interface read-only.
...
The getTotalPrimaryTransferTime()
method in the RestoreRedundancyResults
interface should return a java.time.Duration
object instead of a long
.
This change is intended to help provide more reliable handling of time-based values, as without an explicitly provided time unit there exists a possibility of confusion over the meaning of a long
time value.
...
RestoreRedundancyRegionResult
should be renamed RegionRedundancyStatus
throughout and the description of the class should now be:
Finally, the RegionRedundancyStatus
object will be a data structure containing a snapshot of the name, configured redundancy, actual redundancy and a status representing the state of redundancy for a given region:
public interface RegionRedundancyStatus{
enum RedundancyStatus {
SATISFIED,
NOT_SATISFIED,
NO_REDUNDANT_COPIES
}
String getRegionName();
int getConfiguredRedundancy();
int getActualRedundancy();
RedundancyStatus getStatus();
}
The name change prevents confusion between the RestoreRedundancyRegionResult
and RestoreRedundancyResults
classes and the extraction to an interface prevents leaking of internal classes through a public API. The method name change to getConfiguredRedundancy
is intended to provide more clarity about what the method returns.
...
The section describing the success/error status of the restore redundancy
gfsh command should now read:
The command will return success status if:
- Redundancy is fully satisfied for all regions that were included, either explicitly or implicitly.
- No partitioned regions were found and none were explicitly included.
The command will return error status if:
- At least one bucket in a region has zero redundant copies, and that region has redundancy configured.
- At least one bucket in a region has fewer than the configured number of redundant copies.
- At least one of the explicitly included partitioned regions is not found.
- There is a member in the system with a version of Geode older than 1.13.0 (assuming that is the version in which this feature is implemented).
- The restore redundancy function encounters an exception.
This change brings the gfsh command output in line with the Status
returned by the RestoreRedundancyResults
.
...
The --dont-reassign-primaries
argument should be renamed --reassign-primaries
throughout. The default value of the argument will be true, so the behaviour described in the RFC will be unchanged.
This change brings the gfsh arguments in line with the RestoreRedundancyOperation
interface.
...
The backwards compatibility and upgrade path section should now read:
Members running older versions of Geode will not be able to execute the redundancy command function, so if any such members are detected in the system, the gfsh commands will fail to start and return an error status.
This change takes into account the fact that it will be possible for individual members to successfully start a restore redundancy operation regardless of the other members in the system, but that attempting to send a function to an older member during executing of the gfsh command will result in an exception.
References
[1] https://issues.apache.org/jira/projects/GEODE/issues/GEODE-4250 Anchor ref1 ref1
...