Goals
- Provide High Availability of Manager
- Primary Node Failover (HA) / Incorporate Leader Election functionalities
- Distributed State for user of extensions
- Rolling restarts and upgrades
- Provide multiple tiers of NiFi clusters
- Dynamic node registration, support for dynamic scaling of worker nodes
Management of data partitions among nodes in the cluster to allow for data affinity and allocation of tasks
- Background and strategic fit
Given the genesis of NiFi, clustering was designed to be extremely conservative in the interest of exactly once semantics and guarantee of avoiding data loss. While it is important to maintain this set of functionality, it also desirable to support other use cases where speed and volume are paramount to dataflow and processing with the caveats of eventual consistency and possible data duplication. State of the art for these scenarios is typically heavily leveraging ZooKeeper through a library like Curator.
Assumptions
Requirements
# | Title | User Story | Importance | Notes |
---|---|---|---|---|
1 | Provide leader election | Need a robust mechanism as a basis for these extensions | ||
2 | Distributed State | Extensions need a consistent mechanism for sharing state | ||
3 | Dynamic Scaling | Support for worker nodes to join/leave cluster dynamically |
User interaction and design
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome |
---|---|
What changes are needed for flow change sets and their propagation within a cluster? | |
What are the pros and cons of embedding and/or external leader election ensemble mechanism? | |
How do we provide the same semantics currently in place in view of the new cluster mechanism for consistency? | |
How is user authorization managed in an environment where the manager may change? | |
What is the delineation between manager and primary node in this environment if one even still exists? |
JIRA Issues
- NIFI-540Getting issue details... STATUS