You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

1. Purpose

This page will capture design ideas for creating an Apache NiFi registry for extensions and templates, allowing them to be easily pulled into a NiFi instance, and thus reducing the amount of bundles distributed with an Apache NiFi release.

2. Problem

Today NiFi provides a mechanism to introduce new components (e.g., NAR) and share flow templates (pre-configured flows). For components (Processors, ControllerServices etc.) the current mechanism assumes packages (NARs) containing such components exist in some predefined location on local file system as they are typically distributed with NiFi distribution. For templates there is a UI-based import/export functionality giving user a little more control as to what is available to NiFi.

While fairly simple and well understood, current mechanism does expose certain challenges:

  1. Given that NiFi comes distributed with all supported components its distribution size became quite large housing a series of extension sets that may or may not be of use by a given flow. As more components are introduced, the current mechanism will become unsustainable.
  2. Regardless if there is an intent to use, every component is loaded into the JVM (classes are loaded to produce documentation and multiple instance are created due to the pre-existing bug NIFI-1318 - Getting issue details... STATUS ). 
  3. Management and sharing of artifacts (NARs, templates etc) is a manual process
    1. On top of that exported templates contain flow/component state (running/stopped), some identifiers are tied to the identifier scheme of the system on which the template was made which may present security risk.
  4. No versioning support for components and/or templates
  5. No ability to introduce new components once NiFi is started. Components must exist in NiFi prior to NiFi startup and loaded into NiFi (see #2).
  6. Third-party vendors are locked out of participation due to legal constraints of including a component with non-ASF compatible licenses or including components that may depend on other components with non-ASF compatible licenses.
  7. Repetitive library distribution. While each component is designed to perform a very specific task, most of them still depend on common libraries  (e.g., log4j, spring, etc.). Such libraries are included as part of individual NARs creating a possibility to where several NARs in NiFi distribution contain the same library.  

By providing a configurable and centralized extension registry similar to the one used by other products that expose plug-and-play extension model and thus heavily rely on community participation (Grails plug-ins), most (if not all) challenges described above could be addressed, facilitating even greater NiFi adoption.

3. Proposed Extension Registry Features

Below is the list of features that we may want to consider 

  1. Publish NAR/Template. 
  2. Version, presumably following a well known and recognized convention (e.g., Maven - artifactId:groupId:version).
    1. This will expose ability to have multiple versions of the same component available to the user.
    2. Additionally, NAR/Template may be tied to a specific version (or range of versions) of NiFi
  3. Access components documentation regardless if such components are available locally.
  4. Pull NAR/Template. This essentially implies bringing NAR/Template from the remote location to a running instance of NiFi. However, I don't believe this implies that user has to know about local/remote availability of the component. In fact in best case scenario user experience must not change. For example, user would have a familiar browser window for Processors as they do today, but such window would be populated with what is available to the user from the extension registry. If a particular component is already available locally then that is where it will be loaded from and if not then NiFi should transparently pull such component, essentially caching NAR/Templates locally (similar to local Maven cache). 
  5. Browse and/or search for NARs or templates within and outside of NiFi UI. This implies that the extension registry must have the ability to be accessed with the conventional web browser.
  6. Configure location of extension registry. Similar to Maven, user's must be able to configure the location of the extension registry.

4. Registry Examples

This section will describe how some other projects are tackling the same type of problem.

JFrog Bintray 

Quoting their main page "Bintray gives developers full control over how they store, publish, download, promote and distribute software with advanced features that fully automate the software distribution process"

 Some of the key features;

  • Free for hosting OS artifacts
  • Multiple repository/packaging styles including Maven, Docker RPM etc., essentially supporting versioning feature described in "Proposed Features"
  • Browsing and Search capabilities described in "Proposed Features"
  • Documentation features described in "Proposed Features"
  • REST API to integrated with running NIFi instance to expose Browse, Search and provide access to documentation.  
  • Automated plug-ins (Maven, Gtradle etc) for automated publishing of the artifacts

Spark-Packages

  • Requires authentication with a GitHub account.
  • Registering a package requires pointing to a public GitHub repo owned by the given user, with a LICENSE and README. 
  • A name and description are also provided during registration.
  • There is a command line tool to help start new packages and publish them.
  • The web site lets a user search for packages and view details of each package.
  • The details of a package show how to use the given package with various tools such as the spark-shell, sbt, and/or Maven, see example
  • For Maven it shows a dependency and repository snippet that could be added to a pom, and there seems to be a repository.
  • Packages can also be voted on and tagged.

5. Use Cases

We are guided by the following use cases:

1. To keep NiFi small and lightweight, both in distribution and deployment, we want to separate the 40 current NARs from the core distro and support dynamically loading them, both from a local filesystem repository and a remote web-based repository.  Appropriate security support must be part of the implementation.

2. A User, using the NiFi GUI to construct a new Workflow, needs the ability to DISCOVER new Processors and other extensions they can use.  This suggests interactions that allow them to point NiFi at known-safe external repositories, and then browse or otherwise discover the Extensions available therein (without loading all those extensions, some of which are quite large).  This implies the need to obtain from the repo, manage, and present, METADATA and descriptive info abut the extensions separately from the extensions themselves.

3. The community desires to share Templates.  Templates are XML files with dependencies on Processors and other resources they may specify.  The extension and repository model for NiFi should support sharing Templates and their dependencies in a natural and secure way.

4. One of us is looking at adding an Apache Camel Processor pair to NiFi (NIFI-1842).  Camel supports connecting to over 200 end-point types not yet supported natively in NiFi, and a Camel Processor could give us access to many of them with minimal additional work.  Each of those end-points is already packaged as a dynamically-loadable extension to Camel.  The packaging is as a JAR. We desire to enable NiFi to manage Camel extensions as NiFi plug-ins, without having to repackage each as a NAR, and without needing separate non-Camel repositories for them.  In the future, other new NiFi extensions may also have need for custom sub-extensions of their own.

5. We would like the set of extension types to be extensible, itself.  Each extension type needs its own integration with core NiFi.  Besides loading classes and other resources, those resources must be registered with different parts of the GUI and other components of NiFi.  The extensibility mechanism should recognize and serve this need as well as possible.

6. Design Proposal

A. Principal Abstractions

The two principal abstractions are ExtensionSpec and ExternalRepository.

An ExternalRepository instance represents a repository holding extension packages, and is able to present and deliver them, along with separately presenting and delivering metadata about them, in the form of lists suitable for human-interactive discovery. It also manages information about the security signature of extension packages, and only delivers packages that are properly signed, by a signing authority recognized and approved at the system level. ExternalRepositories don't know anything about the internals of the extension package, or the packaging itself. Their responsibility ends after delivering, to an agreed location in the local filesystem, a security-approved concrete package file that can be loaded by some other mechanism appropriate to the extension type.

Our initial implementation will support as sub-classes:

  • Filesystem directory-based repositories mounted locally on the NiFi server, and
  • Maven repositories enabled in the NiFi server's maven configuration.

The set of ExternalRepository types (sub-classes) will be extensible via the extension mechanism.

An ExtensionSpec instance holds the metadata needed to support identification, discovery, delivery, and loading of a single extension package. Each type of extension will have a sub-class of the ExtensionSpec class, that knows what metadata applies to that type, how it is packaged, and how to load that packaging. After a repository delivers an extension package file, the ExtensionSpec sub-class is responsible for loading it, i.e., dynamically integrating it into the NiFi core and GUI.

Our initial implementation will support:

  • Processor extensions packaged as NARs
  • Controller Service extensions packaged as NARs
  • Templates packaged as XMLs
  • New extension types, i.e., new ExtensionSpec sub-class implementations packaged as NARs
  • New repository types, i.e., new ExternalRepository sub-class implementations packaged as NARs

As we see from the last two items in the above list, the set of ExtensionSpec types itself will be extensible via the extension mechanism.

B. Metadata and Security Signatures

For both the filesystem-based repository and the maven repository, we propose to require that each extension be distributed as a set of 6 files:

  • a single-file package containing the extension itself, as a NAR or other file format appropriate to the extension type
  • a maven POM file for the extension, containing certain mandatory XML elements, from which the extension's metadata is obtained
  • MD5 and SHA1 signature files for each of the package file and POM file.

These requirements mirror the requirements for a normal Maven repository deployment, and are not unreasonably burdensome for a filesystem-based repository deployment.

Other repository types may share the same means of managing the metadata and security signature requirements, or they may choose to manage this info in other ways.

C. Note about Registration and Triggers

(This section is not yet determined, as the developers are still studying the issue.)

To dynamically load an extension, it is not sufficient to simply ingest the code or content of the extension into the JVM. It is also necessary that the clients of the code or content become "aware" that the new material is available. This may be done in a variety of ways, depending on the structure of the client code, and may be referred to as registration, triggering, alerting, or re-scanning.  Today's NAR loader supports some aspects of the above, but this needs to be clarified, and supported for all extension types in a generalized and extensible way.

 

 

  • No labels