You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Target release 
Epic
Document statusDRAFT
Document owner

Mark Payne

Designer
Developers
QA

 

Goals

  • Provide Processors and other components a simple way to store local or cluster-wide state
  • Provide components a mechanism for retrieving the state that has been stored by other nodes in the cluster
  • Provide a mechanism for framework to communicate information between nodes
  • Allow Processors to be easily 'partitionable', so that work is easily split among nodes in a cluster

Background and strategic fit

NiFi provides two basic operating modes: standalone and clustered. When operating in a cluster, Processors often need some mechanism of coordinating state between nodes. One example of a Processor that implements such a capability is ListHDFS. The implementation required to accomplish this is very laborious and error prone, however. Many other Processors could benefit greatly from this approach but this work has not been done because it is very difficult and error prone. Implementing a feature in the framework to share state across the cluster would be tremendously helpful.

Additionally, there are several processors (GetHTTP, for example) that persist state locally. This is done in an ad-hoc manner, creating local files to store the information. This will become far easier and more consistent if a mechanism is provided for storing local state as well.

If we make this capability to store clustered state available to the framework as well, it can provide a mechanism to more easily store information about which nodes are available in the cluster, which can ease the development of the "partionable" processors. Many protocols, such as SFTP do not provide a way to easily spread the work across a cluster of NiFi nodes. If we devise a mechanism by which the work can be partitioned across the nodes, and then exposed this information to components running on the nodes, we could much more easily spread this processing across the cluster. This may take the form of an AbstractPartitionedProcessor, or some utility class, or perhaps just a well-written example for how to interact with remote resources in a partitioned environment.

The implementation details of how and where this state is stored should not be exposed to the components. The most likely implementation, though, would be a ZooKeeper backed implementation. We do not want to require an external ZooKeeper be available in order to run NiFi in a clustered mode, though, so it would make sense to embed a ZooKeeper in NiFi but also allow an external ZooKeeper instance to be used if configured to do so by an administrator.

Assumptions

Requirements

#TitleUser StoryImportanceNotes
1Provide Local State StorageNeed ability for components to store and retrieve local stateMust Have
2Provide Distributed State StorageExtensions need the ability to share state across all nodes in a clusterMust Have

State will be available only to the component that set the state, but will be available to all nodes. If a component wants to share state across multiple nodes and multiple components, that state should be stored by a Controller Service using the distributed state mechanism and that Controller Service can then be shared across components

3Provide Partitioning of WorkProcessors need to be instructed on how to partition their work across nodes in a cluster.Must Have 

User interaction and design

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome
How do we handle local state that already has been stored, such as for GetHTTP?
How do we handle cluster-wide state that already was stored, such as with ListHDFS? 

 

 
  
  

JIRA Issues

 

Not Doing

  • No labels