Target release
Epic
Document statusDRAFT
Document owner

Joe Witt

Designer
Developers
QA

Goals

  • Provide the ability for reference-able groups of processors and relationships which are as natural to use as a processor.

Background and strategic fit

Process Groups enable powerful composition of processors and relationships with specific input ports and output ports.  However, they live exclusively at a specific section of the dataflow and the only way to send data to them is by way of explicit relationship from a source component to an input port of that process group.  Further, the only way to get data out of them is by direct connection from one their output ports to a given component.  Thus, if you want to be able to reuse that same instance of a process group for multiple flows you have to keep the necessary context around it by using something like UpdateAttribute on input to tag data and RouteOnAttribute afterwards to route on that tag.  This can be awkward and can lead to dataflows being more complicated than necessary.  As a result rather than leveraging a single process group for a variety of dataflows folks tend to create templates of that group and have many instances of it.  This in turn can cause wasted resources.  

So, this concept of 'reference-able process groups' means that users could instead refer to a special type of Process Group by perhaps dragging a reference to that Process Group onto their flow wherever they need to.  They would establish connections to its input port just like they would when connecting to a processor and they would establish connections from its output ports to other components just like you would with processor relationships.  In essence what this means is that you can compose 'new processors' simply be creating reference-able Process Groups.  This then becomes like a function reference mechanism.  This could greatly reduce the complexity of the flow visually.  These groups would be special in a couple of ways.   First when sending data to them and in directing their output the framework itself would automatically keep track of its context of use.  Users would just connect to and from them naturally and even though it is a single Process Group being referenced the framework will know where in the actual flow data comes from and goes to.  Second, these special Process Groups would not really live at any specific point in the dataflow.

Assumptions

  • This concept will require some significant consideration for User Experience to avoid confusing the user about which type of Process Group they have.

Requirements

#TitleUser StoryImportanceNotes
1
2    

User interaction and design

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome

Not Doing

6 Comments

  1. Has there been any progession on this feature request? This would make complex flows with repeated functionality a lot more manageable. 

  2. Agreed, this would be a great feature.

  3. Dan

    This feature is probably the deciding factor in whether or not to use NiFi for my current project. It's going to be impossible to manage hundreds of flows without being able to reference reusable components.

  4. Dan I probably should edit this.  Since the time this proposal was written much analysis and learning has happened and this as-written is probably overcome by a far better model.  That is Versioned Flows as supported by NiFi and the Flow Registry.  You can have as many instances of a versioned flow as you need and yet still manage it centrally.  These you definitely need to check out.  Additionally, with the increase in support for expression language controlled properties and the processor group variable registries we routinely see very large multi-team/cross-organizational clusters that represent a massive number of unique flows but managed through a central data distribution hub/NiFi cluster.  It is highly likely your case will be well handled as described.  In some cases you might need many instances of the same versioned flow or in other cases the paramaterization of key properties with expression language mean the same flow instance can handle many different 'flows'.


  5. Dan

    Yea i saw that feature and I have some questions about using it to handle cases like this. It seems to me that with the versioning you still need to go through each flow and update the version of the component, as opposed to updating it one location, which all flows reference. That's obviously much easier than full manual changes, but it still requires touching each flow.

    This probably isn't the right place for this question, let me know if I should ask it somewhere else.

    1. You're correct for now each instance requires you to opt-in to a version update - we've not made an auto-update available yet. But we certainly will and with a policy management aspect to it such that versioned flows will have tags associated with them and then instances of those versioned flows will auto update to versions that meet its tag criteria.