Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Page properties
Target release0.45.0
EpicNIFI-259
Document statusstatusFINAL
titleDRAFT
Document owner

Mark Payne

Designer
Developers
QA

...

User interaction and design

I have made a significant amount of progress toward this Feature Proposal and expect it to be rolled into the 0.5.0 version, with the exception of #3 above (Provide Partitioning of Work). That can come in a later release, as it is fairly independent from the work of storing state locally and clustered.

From an API perspective, I have updated the ProcessContext to contain a new method:

/**
* @return the StateManager that can be used to store and retrieve state for this component
*/
StateManager getStateManager();
 

The StateManager, then, is defined as such:

StateManager.java

 

This provides a simple, consistent API for component developers to use to store and retrieve state, simply indicating whether they are interested in local or cluster-wide state. If a developer uses the CLUSTER scope, but NiFi is not connected to a cluster, the framework is responsible for simply delegating to the Local provider instead. This allows a Processor developer to not have to worry about whether they are running in a clustered environment or not.

From the framework perspective, the work is performed by allowing a simple StateManager class to delegate to a StateProvider. State Provider has essentially all of the same methods as State Manager but is intended to perform the storage/retrieval of state by using some underlying mechanism. The nifi.properties file then is configured to provide a Local State Provider and a Clustered State Provider. There is a single implementation of each at this time. The local state provider is built atop the WriteAhead Log that is used by the FlowFile repository, so that it is very efficient to save state many times. Each component will store data in a separate file/directory. This allows the state for a single component to be manually deleted if needed by an administrator. This may be done, for example, if a bug is found in a processor such that state cannot be read, or if framework state needs to be cleared (for instance, if we are copying the state from 1 node to another but want to remove information held about the node's cluster information).

The Clustered State Provider is implemented by communicating with a ZooKeeper instance. The ZooKeeper state provider is configured with a connect string and the root node in ZooKeeper where data should be stored. This allows multiple NiFi instances to store state in different ZooKeeper nodes.

We want to avoid requiring that an administrator already have a ZooKeeper instance installed and maintained in order to use NiFi, however. As such, a NiFi node can be configured to start an embedded ZooKeeper server. This way, an administrator can start an embedded server on as many nodes as he/she wishes.

 

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome
How do we handle local state that already has been stored, such as for GetHTTP?The @OnScheduled method of this Processor has been updated to read the existing state file (if it exists), store the information into the State Manager, and delete the existing file.
How do we handle cluster-wide state that already was stored, such as with ListHDFS? The @OnScheduled method of these Processors have been updated to read the existing state file (if it exists) and/or retrieve state from the configured DistributedMapCacheClientService, store the information into the State Manager, and delete the existing file / clear the info in the DistributedMapCacheClientService.

 

 
  
  

JIRA Issues

 

Not Doing

...