IDIEP-73
Author
Sponsor
Created

 

StatusDRAFT

Motivation

In order to unblock business-related logic (atomic protocol, transactions, table management, etc) above the service-related logic(network, discovery protocol through meta storage, etc) it's required to specify components communication flow and initialization logic. It seems natural to choose node startup as an entry point for such high-level needs. Thus, node startup should provide:

  • control over components initialization.
  • control over components communications channels.

Description

From a bird's eye view, the set of the components and their connections may look like this:

where:

  • The number in front of the component's name shows an order in which components are initialized. So the very first component to be initialized during node startup is Vault. There are few components that should be instantiated before node start-up: cli and ignite-runner, however they are out of the scope of the node startup process.
  • Arrows show direct method calls. For example Affinity component could retrieve baseline from Baseline component using some sort of baseline() method. In order to decrease mess a bit, two explicit groups of arrows are introduced:
    • Green Arrows show direct method calls to Meta Storage (MS) Component.
    • Blue Arrows show direct method calls to Configuration Component.
  • There's also an upwards communication flow through the listeners/watches mechanism. However, within the scope of alpha 2 release, it's only possible to listen to Vault, Meta Storage, and Configuration updates.

Few more words about components responsibilities and inner flow:

VaultManager and LocalConfigurationManager

Vault is responsible for handling local keys, including distributed projections. During initialization, VaultManager checks whether there is any configuration within Vault's PDS, if not it uses the customer's bootstrap configuration if provided. Bootstrap configuration goes through the local configuration manager.

Vault and Local Configuration Manager
// Vault Component startup.
VaultManager vaultMgr = new VaultManager();

boolean cfgBootstrappedFromPds = vaultMgr.bootstrapped();

List<RootKey<?, ?>> rootKeys = new ArrayList<>(Collections.s
ingletonList(NetworkConfiguration.KEY));

List<ConfigurationStorage> configurationStorages =
	new ArrayList<>(Collections.singletonList(new LocalConfigurationStorage(vaultMgr)));

// Bootstrap local configuration manager.
ConfigurationManager locConfigurationMgr = new ConfigurationManager(rootKeys, configurationStorages);

if (!cfgBootstrappedFromPds)
	try {
    
	locConfigurationMgr.bootstrap(jsonStrBootstrapCfg);
    }
    catch (Exception e) {
    	log.warn("Unable to parse user specific configuration, default configuration will be used", e);
    }
else if (jsonStrBootstrapCfg != null)
	log.warn("User specific configuration will be ignored, cause vault was bootstrapped with pds configuration");
ManagerDepends On
Used By
VaultManager-
  • LocalConfigurationManager in order to store local configuration and update it consistently through listeners.
  • MetastorageManager in order to commit processed MS watch notifications atomically with corresponding applied revision.
LocalConfigurationManagerVaultManager
  • NetworkManager in order to bootstrap itself with network configuration including sort of IPFinder and handle corresponding configuration changes.
  • MetaStorageManager in order to handle meta storage group changes.
  • ConfigurationManager indirectly through LocalConfigurationStorage for the purposes of handling local configuration changes.

NetworkManager

 It's possible to instantiate network manager when local configuration manager with vault underneath it is ready

Network Manager
NetworkView netConfigurationView =
	locConfigurationMgr.configurationRegistry().getConfiguration(NetworkConfiguration.KEY).value();

// Network startup.
Network net = new Network(
	new ScaleCubeNetworkClusterFactory(
    	localMemberName,
        netConfigurationView.port(),
        Arrays.asList(netConfigurationView.networkMembersNames()),
        new ScaleCubeMemberResolver()));

NetworkCluster netMember = net.start();
ManagerDepends OnUsed By
NetworkManagerLocalConfigurationManager
  • MetaStorageManager in order to handle cluster init message.
  • RaftManager in order to handle RaftGroupClientService requests and for the purposes of inner raft group communication.
  • BaselineManger in order to retrieve information about current network members.

RaftManager <Loza>

After starting network member Raft Manager is instantiated. Raft Manager is responsible for handling raft servers and services life cycle.

RaftManager
// Raft Component startup.
Loza raftMgr = new Loza(netMember);
ManagerDepends OnUsed By
RaftmanagerNetworkManager
  • MetaStorageManager in order to instantiate and handle distributed metaStorage raft group.
  • TableManager in order to instantiate and handle partitioned/ranged raft groups.

MetaStorageManger and ConfigurationManger

Now it's possible to instantiate MetaStorage Manager and Configuration Manager that will handle both local and distributed properties.

MetaStorage Manager and Configuration Manager
// MetaStorage Component startup.
MetaStorageManager metaStorageMgr = new MetaStorageManager(
	netMember,
    raftMgr,
    locConfigurationMgr
);

// Here distributed configuraion keys are registered.
configurationStorages.add(new DistributedConfigurationStorage(metaStorageMgr));


// Start configuration manager.
ConfigurationManager configurationMgr = new ConfigurationManager(rootKeys, configurationStorages);
ManagerDepends OnUsed By
MetaStorageManager
  • VaultManager
  • NetworkManager
  • RaftManager
  • LocalConfigurationManager
  • ConfigurationManager in order to store and handle distributed configuration changes.
  • BaselineManager in order to watch private distributed keys, cause ConfigurationManger handles only public keys.
  • AffinityManager for the same purposes.
  • Probably SchemaManager for the same purposes.
  • TableManager for the same purposes.
ConfigurationManager
  • LocalConfigurationManager
  • MetaStorageManager
  • BaselineManager in order to watch public keys.
  • AffinityManager for the same purposes.
  • Probably SchemaManager for the same purposes.
  • TableManager for the same purposes.
  • IgniteImpl

Business logic components: BaselineManager, AffinityManager, SchemaManager, TableManager, etc.

At this point it's possible to start business logic components like Baseline Manager, Affinity Manager, Schema Manager and Table Manager. The exact set of such components is undefined.

Top Level Managers
// Baseline manager startup.
BaselineManager baselineMgr = new BaselineManager(configurationMgr, metaStorageMgr, netMember);

// Affinity manager startup.
AffinityManager affinityMgr = new AffinityManager(configurationMgr, metaStorageMgr, baselineMgr);

SchemaManager schemaManager = new SchemaManager(configurationMgr);

// Distributed table manager startup.
TableManager distributedTblMgr = new TableManagerImpl(
	configurationMgr,
    netMember,
    metaStorageMgr,
	affinityMgr,
    schemaManager);


// Rest manager also goes here.
ManagerDepends OnUsed By
BaselineManager
  • ConfigurationManager
  • MetaStorageManager
  • NetworkManager
AffinityManager in order to retrieve the current baseline.
AffinityManager
  • ConfigurationManager
  • MetaStorageManager
  • BaselineManager
TableManager strictly or indirectly through corresponding private distributed affinityAssignment key.
SchemaManager
  • ConfigurationManager
  • Probably MetaStorageManager
TableManager in order to handle corresponding schema changes.
TableManager
  • ConfigurationManager
  • MetaStorageManager
  • NetworkManager
  • AffinityManager
  • SchemaManager
  • RaftManager
IgniteImpl

Deploying watches and preparing IgnitionImpl

Finally, it's possible to deploy registered watches and create IgniteImpl that will inject top-level managers in order to provide table and data manipulation logic to the user.

Deploy registered watches and create IgniteImpl
// Deploy all resisted watches cause all components are ready and have registered their listeners.
metaStorageMgr.deployWatches();

return new IgniteImpl(configurationMgr, distributedTblMgr);

Component flow

In general from some point of view node is a collaboration of components mentioned above. In order to satisfy needs of:

  1. Node start/stop
  2. Dynamic component start/stop
  3. Dynamic run level changes both forward and backwards

it makes sense to describe component flow in more detail. Here it is:

  1. Component object instantiation.
  2. Dependencies injection either with constructor or DI framework.
  3. Component start either in specified run level or with default one:
    1. Read and apply related local configuration if any.
    2. Check whether a component's state lags behind the state of already started components, in other words check if component's applied revision is less than node's applied revision (by design all already started components will have same applied revision, so it's possible to denote such bunch of applied revisions as node applied revision). Applied revision here is the latest revision of configuration and meta storage updates that component have successfully processed and committed to vault. If the current component's applied revision is less then node's one than component either uses historical update logic(optimization, not an option currently) or full one to promote the state. Both options will be described in more detail in corresponding sections.
    3. Deploy configuration and meta storage watches in order to be notified about any new updates.
    4. Register message handlers.
    5. Register local components listeners.
    6. Start threads/thread pools.
    7. All other inner related stuff.
    8. Please pay attention that the order and completeness of the above operations are not strictly defined and depend on the design specifics of a particular component.
      Component is ready.
  4. At a given state it's possible to move component's run level both further or backwards. Depending on the specified run level it might be necessary to process logic specified at step 3.
  5. Component stop. The core point of stopping a component is to safely terminate the component's ongoing operations as fast as possible. That include following steps:
    1. Unregister configuration component's configuration listeners and meta storage watches.
      Prevent any network communication. In case of an attempt of a network communication - ComponentStoppingException should be thrown. Catching such exceptions should complete awaiting futures if any with corresponding reason.
    2. Unregister local listeners that will also complete local listeners futures with ComponentStoppingException.
    3. Stop treads/thread pools and do all other inner component related stuff.

Historical update

Historical here reflects a certain similarity with the historical rebalance design. It is worth mentioning that the main problem of upgrading components to the state that is newer than node's state is an impossibility of doing consistant reads of beneath components. In other words, if we start TableManager from within appliedRevision 10 and SchemaManager already have appliedRevision 20, schemaManager.getScheama(tableId) will return schema for revision 20 that is not expected by TableManager that processes table updates for revision 11. In order to provide consistent reads, the requested component could analyze the callers context and either recalculate requested data based on callers applied revision or return previously cached historical data. In any case this logic seems to be non-trivial and might be implemented later as a sort of optimization.

Full update

Another solution to satisfy component update that will preserve consistent cross components reads will include:

  • Component's data eviction.
  • Synchronous component's state retrieval from distributedConfigurationManager. At this point, cause all intermediate updates are skipped and only up to date state is retrieved, consistent cross component reads are guaranteed.

Risks and Assumptions

// N/A

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Terms-clarification-and-modules-splitting-logic-td52026.html#a52058

Reference Links

https://github.com/apache/ignite-3/tree/main/modules/runner#readme

Tickets

Umbrella Ticket: IGNITE-14410 - Getting issue details... STATUS

Initial Implementation: IGNITE-14411 - Getting issue details... STATUS


  • No labels

4 Comments

  1. Do we have any plans to add dependency injection (IoC container)?
    As far as I remember there were thoughts to restrict cycles in components dependency and use IoC.

    Also, a component initialization process is not clear. At what point does a component/function will be available for the other components?

    E.g. there is a number of use cases:

    • Cluster very first start.
    • Node with no persistent tables.
    • Node with persistence. Rebooted node may need to recover from WAL before join.
    • Client reconnect case. Clients are thin, but still may have different lifecycle.
    • Cluster activation/deactivation.
    • Readonly cluster?
    • Maintenance mode?
    1. > Do we have any plans to add dependency injection (IoC container)? 

      Yes, however I don't know when it'll be implemented. Definitely it's not in the alpha 2 scope.

      > As far as I remember there were thoughts to restrict cycles in components dependency and use IoC.

      Yep, however there should be no cycles in current implementation.

      > Also, a component initialization process is not clear. At what point does a component/function will be available for the other components?

      There's two phase node/cluster initialization process. At first phase components start at "Zombie" state. At this point any component can locally communicate with any other component bellow it, however node and components beneath it are not ready to handle user requests.

      After processing cluster init message (produced by cli), or from the group membershup service via gossip if given node joins already initialized cluster, MetaStorageManager instantiates required RaftGroup and in this way node moves from "Zombie" to initialized state. Now it can handle user requests.

  2. In 2.x Ignite cluster can be in several states with different functionality: inactive, read-only, fully operational. Also individual cluster node can enter maintenance mode and stay isolated from the rest of the cluster yet supporting some functionality.

    The same or similar modes is required in 3.x version so we need to design how we want to support this on components level. Also it would be nice to have a feature when components could perform additional actions based on information from other components from earlier stages of start-up. E.g. in 2.x there was a feature request (not implemented though) to warm-up caches on starting node based on information from the cluster about which caches are already started or already dropped.

    One idea is to organize node (and cluster) state into an hierarchy of runlevels (term from Unix-like systems) with different functionality available on each level.
    Switching between levels should be supported in both directions (from limited functionality modes to fully-functional and vise versa).

    Although this model is more flexible than existing two-phase start procedure in Ignite 2.x, there are open questions about its design:

    1. To what extent are components independent from one another in terms of managing their states? E.g. for Maintenance Mode in 2.x we need to initialize persistent storage to almost fully-functional when other interfaces are completely off. E.g. cache component depends on persistent storage but shouldn't be initialized even though its dependency (storage) was initialized. How these rules should be expressed?
    2. What component (or metacomponent) should be responsible for orchestrating overall process of (most likely parallel) initialization of others?
    3. How errors during component initialization should be handled?

    Please feel free to propose possible designs or add more questions for consideration to the list.

  3. List<RootKey<?, ?>> rootKeys = new ArrayList<>(Collections.singletonList(NetworkConfiguration.KEY)); List<ConfigurationStorage> configurationStorages = new ArrayList<>(Collections.singletonList(new LocalConfigurationStorage(vaultMgr)));
    I believe that both of lists will not modified in configuration manager. So right part of expression could be simplified to List.of(...)