To do this, the proposal is to:
allow multiple invokers to operate, but only a single is active
when the active invoker fails, an inactive one will become active
when an invoker becomes active, attempt to resurrect the free and prewarm pool members, so that existing usable containers are still useable. (may only apply to ContainerFactory impls that consider a cluster-wide view of containers)
Required Changes
To support running the mesos framework in an HA mode, where a failure will recover without losing existing containers, the following OpenWhisk changes will be proposed:
optionally allow invokers to join a cluster
optionally allow invokers to initialize with the same instance id (all invokers are id 0)
optionally allow activation feeds to operate as an Akka ClusterSingleton
optionally allow invokers to use Maps that replicates pool data to other invokers in the cluster, for use in failover scenarios
optionally allow ContainerFactory impls to create Container instances using an "attach()" function, for connecting to pre-existing containers.
Details
Background on required changes:
optionally allow invokers to join a cluster
- needed to establish a cluster-wide singleton invoker and replicate data using Akka DistributedData
optionally allow invokers to initialize with the same instance id (all invokers are id 0)
- needed to coax all invokers to consumer from the same activation topic (e.g. invoker0)
optionally allow activation feeds to operate as an Akka ClusterSingleton
- needed so that feed consumers do not initiate consumption on multiple invokers (but invokers still become active, to receive replicated data)
optionally allow invokers to use Maps that replicates pool data to other invokers in the cluster, for use in failover scenarios
- needed to cause replication of prewarm + free pool data to other inactive invokers
optionally allow ContainerFactory impls to create Container instances using an "attach()" function, for connecting to pre-existing containers.
- needed to manufacture "ContainerProxy" actors using existing container metadata. (or allow ContainerFactory impls to avoid this scenario if desired.)
The changes proposed are specifically to cope with failure scenarios, so it is simplest to describe these in a sequence diagram to illustrate how these changes affect the outcome: