Component documentation improvements

Introduction

This document outlines current approach used by NiFi to document components such as Processors, ControllerServices etc. as well as proposes improvements. Its purpose is to facilitate the discussion within NiFi community with hopes of finding a more sustainable and clean solution that would benefit both the developers and users alike.

Problem

Current mechanism to document NiFi components such as Processors, ControllerServices etc., is API-based. A typical pattern that a developer of a component follows is to define documentable elements (PropertyDescriptor(s), Relationship(s) etc) as static final variables of a component and then provide implementation of instance methods to access their values. Such methods are:

Set<Relationship> getRelationships();
List<PropertyDescriptor> getSupportedPropertyDescriptors().

There are couple of issue with such mechanism:

Usage of instance methods to essentially access static fields. Not a big issue and for some may not be an issue at all, but worth pointing out as it may be confusing to others.
Repetitive code in getRelationships() and getSupportedPropertyDescriptors().
Because instance methods are used to access documentable elements, an instance of the component must be created. This fact presents a number of issues:
1. Security. As NiFi evolves it is conceivable to assume that at some point one of the security requirements could be that an instance of a component may only be created by an authorized principal (e.g., run as)
2. Questionable code in default constructor. Unrelated (un-used) and poorly implemented component may render the entire system unavailable by having "questionable code" inside its constructor. And while "questionable code" can also be present in static initializer and cause the same issues during class loading, it's a less common pattern and falls in the realm of general coding practices (outside of NiFi).
3. Throwaway instances. In fact a running NiFi without a flow will create and throw away 2 instances of each discovered component, regardless of how many will actually be used. And if component is already in the flow 5 instance of it will be created (see NIFI-1318 - Getting issue details... STATUS )
4. Leaky abstractions (of sort):
  1. developer now must be aware that they should not implement default constructor or any constructor for that matter.
  2. if they do they must be aware of any potential side-effects of creating multiple instances of a component, since component is naturally singleton in the process space.
Reliance on the ServiceLoader to discover components.
1. Current extension model of NiFI assumes local availability of extensions (e.g., some local dir such as 'work') where they can be discovered. ServiceLoader helps with such discovery. But one of the side-effects of the ServiceLoader is a subsequent creation of a component's instance, regardless if the actual instance needs to be created.
2. As NiFi moves toward the concept of an ExtensionRegistry the assumptions of having components available locally is no longer valid, since implementation of ExtensionRegistry may not be based on locally available components.
Generation and re-generation of all documents during NiFi startup
1. While current model works for a current state/size of NiFi, it is unsustainable for both NiFi distribution and NiFi documentation when one deals with thousands of components, versions etc.
2. In the world of multiple instances of NiFi each instance essentially has to generate its own documentation which is identical to that of another instance.

Possible improvements

To embrace convention over configuration approach which if addressed properly would greatly simplify user(dev) experience and would help to address the following:
1. Confusion with using instance methods to access static fields. User would no longer have to implement those methods and they can be deprecated.
2. Repetitive code in getRelationships() and getSupportedPropertyDescriptors().
  1. Analyzing many processors it becomes clear that the code in these methods is very repetitive as seen in example below. A very typical pattern is to assemble Collection in init() method or default constructor, leaving getRelationships() and getSupportedPropertyDescriptors() to simply return the instance variable representing such collections:
    @Override protected void init(final ProcessorInitializationContext context) { final List<PropertyDescriptor> descriptors = new ArrayList<>(); descriptors.add(FILE_SIZE); descriptors.add(BATCH_SIZE); descriptors.add(DATA_FORMAT); descriptors.add(UNIQUE_FLOWFILES); this.descriptors = Collections.unmodifiableList(descriptors); final Set<Relationship> relationships = new HashSet<>(); relationships.add(SUCCESS); this.relationships = Collections.unmodifiableSet(relationships); } @Override protected List<PropertyDescriptor> getSupportedPropertyDescriptors() { return descriptors; } @Override public Set<Relationship> getRelationships() { return relationships; }
  2. Repetitive code could be easily handled by Reflection-based mechanisms already used by NiFi and many other projects and products. An example of one such way could be seen here (NIFI-1384 PR). Further more, for enhanced control over which element are meant to be documented, a new @Documentable annotation could be introduced. This would remove a burden from the user to implement these methods making component development much simpler.
  3. The aforementioned methods could be deprecated and eventually removed.
While we can recommend best practices, developers could feel free to implement their component as they wish. For example, implement default constructor with framework assurance that if they do something "questionable" in it, it will have no impact on the overall system until such component is introduced in the process space, at which point it's an expression of intent to use.
Have documentation generated as part of the build process. There are couple of benefits to that:
1. NiFi instance would not have to generate/re-generate documentation every time it starts
2. Documentation would essentially be re-used by multiple instances since it would exist in the NAR.
3. Within the concept of ExtensionRegistry NiFi user will be able to access documentation before a component is introduced to the process space.
  1. such documentation would be available outside of NiFi instance and/or UI (e.g., conventional browser)
  2. there would be no pollution and stagnation of documentation artifacts that are not used in the flow. In other words access pattern to documentation would be "get-from-local-if-available-otherwise-call-remote".

Space shortcuts

Child pages

Introduction

Problem

Possible improvements

10 Comments

Mark Payne

Dan Bress

Joe Witt

Oleg Zhurakousky

Joe Witt

Oleg Zhurakousky

Joe Witt

Mark Payne

Matt Burgess

Oleg Zhurakousky