This is a functional specification for the membership manager in Geode.  This replaces the JGroupMembershipManager that is in the incubating version of Geode.

The primary functions of the membership manager are to implement membership for the distributed system and handle all message sending/receiving.  It has a plug-in for the DistributionManager so it can receive messages and it must map between whatever internal identifiers are used for membership/messaging and Geode's InternalDistributedMember identifiers.

The membership manager can forcefully shut down a Geode cache if it detects it is no longer a member of the distributed system.

Interfaces

There are a number of existing interfaces in Geode that must be implemented by the membership manager:

MembershipManager - provides membership and messaging functionality to the DistributionManager

NetMember - represents a member ID in the membership manager, plugs into InternalDistributedMember

MemberServices - factory for creating a NetMember

NetView - this is actually a class that represents a membership set.  It must be created by the membership manager for use by DistributionManager

QuorumChecker - used during AutoReconnect to poll to see if a quorum of a NetView is reachable

 

Internally there are interfaces in the membership manager that provide separation of concern for each of its components.  This should allow us to plug in different implementations for each component such as ring-based or phi-accrual based health monitoring.

 

Service - interface implemented by all internal components

void init(Services s)

void start(CancelCriterion c) - called after all services have been initialized with init() and all services are available via Services

void started() - called after all servers have been started

void stop()

void stopped() - called after all services have been stopped

void installView(NetView v)

void beSick(), playDead(), beHealthy() - used for membership testing

void emergencyClose() - shut down threads & other resources like sockets

 

 

ServiceFactory - used by the membership manager to instantiate its services

ServiceConfig create(Manager m, ServiceConfig sc)

 

Authenticator - authenticates a member

String rejectionMessage authenticate(InternalDistributedMember m)

Object getCredentials()

 

HealthMonitor - monitors members and instigates removal of those deemed dead

void contactedBy(InternalDistributedMember m) - tells the monitor that we've had contact with another member

void suspect(InternalDistributedMember m) - tells the monitor that the member is suspected of being ill or dead

void checkSuspect(InternalDistributedMember m) - requests a health check on another member.  This should initiate removal of the member if it does not pass the test

 

JoinLeave - manages joining, leaving and removing members.

boolean join() - joins the distributed system and returns true if successful, false if not.  Throws SystemConnectException and GemFireConfigException

void leave() - leaves the distributed system.  Should be invoked before stop()

void remove(InternalDistributedMember m) - force another member out of the system

InternalDistributedMember getMemberID()

 NetView getView()

 

Locator - used by TcpServer to handle peer-location requests.  Implements TcpHandler

 

Manager - internal interface for working with the membership manager.  Extends MessageHandler

void send(DistributionMessage m)

InternalDistributedMember getMemberID(NetMember m)

void forceDisconnect(String reason)

boolean isShunned(DistributedMember mbr)

DistributedMember getLeadMember()

DistributedMember getCoordinator()

 

 

 

MessageHandler - receives messages from a Messenger

void handle(DistributionMessage m)

 

Messenger - sends and receives messages.  All messages are delivered to the Manager unless there is a handler installed for the message's class

void addHandler(Class c, MessageHandler h) - adds a handler for the given class/interface of messages

void send(DistributionMessage m) - sends an asynchronous message

InternalDistributedMember getMemberID() - returns the endpoint ID for this member

 

Services - provides access to ServiceConfig and a directory of the membership manager's internal components

get/setAuthenticator

get/setConfig

get/setHealthMonitor

get/setJoinLeave

get/setLocator

get/setManager

 

ServiceConfig - provides configuration information for the manager and its components

DistributionConfig getDistributionConfig()

Properties getProperties()

 

Implementation Notes

In order to preserve as much of the current membership behavior as possible, fostering adoption of Geode by the GemFire user base the existing JGroupMembershipManager will be copied and most of its code will be preserved.  It will continue to hold the DirectChannel but will now also hold a ServiceConfig that it will use in place of the JGroups channel.

The implementation of each of the other components will be in separate packages to keep the code clean and possibly allow for different implementations to be plugged in.

The Authenticator implementation will use Geode's authentication API to authenticate another member and to get credentials for JoinLeave to use in sending membership views and join requests.

The HealthMonitor implementation will initially use the NetView to form a look-to-the-right ring for one member to monitor another.  HealthMonitor will keep a record of the last time a message was received from each member in the system (note - this must be done without clock probes, possibly following the pattern in EventTracker).  If the member it is watching has not made contact in the last member-timeout milliseconds it will request a heartbeat from the member and perform a timed attempt to connect to the members DirectChannel port (if available) and request a health response.If the member does not respond within member-timeout milliseconds HealthMonitor will remove it using the JoinLeave.removeMember() API.  The implementation of removeMember will forward the request to the current membership coordinator who will perform its own health-check on the member before removing it (sending out a new NetView).  When the ping request has been sent HealthMonitor will go on to examine the next member in the view.

TCPConduit will be modified to check for a health request and respond with its membership ID.  The HealthMonitor will use this to ensure that the port hasn't been reused by another process.

The JoinLeave implementation will use Messenger, and possibly the membership manager, to communicate with other members.  It will use TcpClient to contact Locators when joining in order to find the current membership coordinator.  Once it knows the coordinator it will send it a Join message including authentication credentials.  JoinLeave will also implement membership coordination functions (i.e., replace what we're doing with JGroups GMS).  It will be responsible for detecting a network partition and invoking forceDisconnect() in the membership manager.

The Locator component will persist the current membership view and will respond to requests for the ID of the current membership coordinator.  If there is no membership coordinator (meaning the Locator is booting up) then it will return its best guess of who the coordinator is based on who has contacted it.  The name of the locator's state file will be changed to membershipView.dat

The Manager API is what should be used by all components to interact with the membership manager.

The Messenger component will use a trimmed-down modern JGroups channel to perform UDP messaging.  JGroups will no longer be forked for use in Geode but will be added as a dependency.  Messenger will be responsible for installing the current NetView in its JGroups protocol stack as a native JGroups View so that UDP broadcast works and multicast message garbage-collection can be properly performed.  Note that this switch to using off-the-shelf JGroups means we will start seeing more log messages from JGroups than in the past.

Also note that we may not be able to switch to a newer version of JGroups without risking rolling upgrade support.  If the new version of JGroups is not on-wire compatible with the previous version people will not be able to perform a rolling upgrade.

It will be Messenger's responsibility to install Geode's settings from the DistributionConfig (gemfire.properties) into its JGroups channel.  The protocol stack should look something like this:  <UDP> <BARRIER> <pbcast.NAKACK2> <UNICAST3> <pbcast.STABLE> <MFC> <UFC> <FRAG2>.  Of course there will be lots of settings in each of these protocols to customize the stack.  There is no requirement that the JGroups stack configuration be in an external file.  It can be a string embedded in the Messenger implementation.  XML will need to be used because the JGroups PlainConfigurator still uses a colon as a protocol separator and this is incompatible with IPv6 addresses.

All of the JGroups statistics in DistributionStats need to be removed or replaced with corresponding stats based on the new implementation.

Testing

Since this is implementing an existing interface in Geode there are already a lot of tests that exercise it.  These tests will need some attention if they are referring to any JGroups code.  The use of interfaces in this version of the MembershipManager should allow us to create real unit tests, as opposed to integration tests, for each component to achieve a higher level of code coverage.

Jepsen testing should be performed to ensure that the membership manager behaves as expected during network failures, GC pauses, etc.  All releases of Geode should require Jepsen testing.

 

 

  • No labels