To be Reviewed By: Geode Dev
Authors: Jinmei Liao
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
Each locator will have an instance of CMS running. Multiple invocations of CRUD operation may happen at the same. We need to make sure that these calls are done in a serial fashion, not intertwined.
Anti-Goals
N/A
Solution
Each CMS needs to have a dlock service, when CRUD operation is started, needs to obtain a dedicated dlock to proceed.
At service initialization time, needs to create this dlock service:
private static DistributedLockService getCMSLockService(InternalDistributedSystem ds) { DistributedLockService cmsLockService = DLockService.getServiceNamed(CMS_NAME); try { if (cmsLockService == null) { cmsLockService = DLockService.create(CMS_NAME, ds, true, true); } } catch (IllegalArgumentException ignore) { return DLockService.getServiceNamed(CMS_NAME); } return cmsLockService; }
And then, in the beginning/end of each CRUD operation, do a set of lock/unlock
public boolean lockCMS() { return cmsLockService.lock(CMS_NAME, -1, -1); } public void unlockCMS() { cmsLockService.unlock(CMS_NAME); }
Changes and Additions to Public Interfaces
N/A
Performance Impact
There would some slight performance impact since now every CRUD operation through CMS will be serialized, even if they are initiated on different locators. But since these operations are not frequent in the system (these operations are operations that changes the cluster configuration, like create/delete regions/indexes etc, the performance impact can be tolerated.
Backwards Compatibility and Upgrade Path
Will the regular rolling upgrade process work with these changes? Yes
How do the proposed changes impact backwards-compatibility? Are message or file formats changing? No
Is there a need for a deprecation process to provide an upgrade path to users who will need to adjust their applications? No
Prior Art
What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?
3 Comments
Donal Evans
Could some performance impact be mitigated somewhat by having separate locks for different areas of the cluster config? Creating/deleting a region should have no impact on changing the configured MethodInvocationAuthorizer, and vice versa, for example, so it should be safe to perform both operations in parallel.
Jinmei Liao
Good point. Donal.
With a CRUD operation, CMS knows the following:
We could either mitigate the performance impact by having separate locks either for different types or different IDs or even both. If we have dlocks per type, that means all region CRUD operations will be serialized, but an index operation can be in parallel with a region operations. If we have dlocks per ID, that means say a region named "foo" will be created in parallel with another region named "bar", but an index named "foo" will have to wait until the region named "foo" is done. Or we can combine both. It all depends on how much mitigation we want to have.
Jinmei Liao
Having one common dlock for all CMS operations
Pro:
Con:
Having a dlock per type or ID for CMS Operations:
Pro:
Con:
2. a bit harder if we want to have gfsh to participate in the locking