Goals

  • Allow users to upload a file to NiFi so that it can be made available to Processors in the flow. For example, a JDBC Driver for use by a Database Connection Pool

Background and strategic fit

Many NiFi Processors and Controller Services require specific resources, or “assets” to be available locally on the system in order to work. This is common for JDBC Drivers, JMS Drivers, enrichment datasets, configuration files, etc.

Currently, the only way that a user has available to do this is to upload the file from their local computer directly to each of the NiFi nodes in the cluster, using SFTP, SCP, etc. This can be tedious. However, many users do not even have access to perform these types of file uploads.

The intent here is to provide an option in the UI (and programmatically via the API) for a user to upload one or more files and let NiFi handle distributing those files to all nodes in the cluster.

In order for this to work, the user must then be told where NiFi stored the file so that the user is able to configure components to use this file. Because this in itself becomes cumbersome, we will instead tie the upload of assets to a Parameter Context. In this way, the user is able to upload a file as a configuration for any Parameter. Similarly, when configuring a Processor or Controller Service, the user is able to choose to upload a file directly from the Property Configuration dialog, and this will automatically create a Parameter with the specified name and update the property to reference that parameter (in the same way that the UI currently works when choosing to promote a value to a parameter). This is generally a best practice for any file references, as well, because the location of a file may well change from system to system if the flow were to be moved/copied elsewhere.

User interaction and design

The UI will allow one or more files to be uploaded for a given Parameter. The value of the Parameter will then reflect the fully qualified filename of each of those files. If multiple files are provided, the value of the Parameter will reflect a comma-separated list of those filenames, in the order that they were provided.

Additionally, when configuring a Controller Service or Processor property, the user will have the ability to click an icon in the Property Configuration dialog and upload a file from there. This will automatically create a parameter and update the property to reference that Parameter.

All Authorization of assets will be tied to the Parameter Context that the Asset belongs to. An Asset cannot be reused across Parameter Contexts. There will be no higher-level “Asset Management” page. Rather, the assets will be automatically removed when they are no longer referenced by any Parameter in the context.

The Asset Manager should be made extensible via interface. However, this should not be a third-party extension point. Rather, the API will reside in the nifi-framework-api module, nifi-framework-core-api, etc.

Security Considerations

  • It will be important that the user be able to supply a filename so that the referenced file makes sense. However, any path that is provided will be dropped or an Exception will be thrown if provided to the REST API, as appropriate.
  • Upload of an Asset will require READ/WRITE permissions on the associated Parameter Context, as well as the Write File System Required Permission.

Required Code Changes

There will be several areas of the codebase that will be impacted. These areas will be outlined here.

API
---

The Parameter class lives in nifi-api. As such, it will require some modifications, but the API will not be heavily modified. The majority of the API will live within the “framework API” meaning that it is not subject to the stringent backward compatibility that we offer against the nifi-api module.

The following modifications are suggested to the nifi-api module:

  • Introduction of an Asset interface. This will look something like:


public interface Asset {

   String getIdentifier();

   String getName();

   File getFile();
}


  • Parameter will be updated to allow for references to Assets:
private final List<Asset> referencedAssets;

public List<Asset> getReferencedAssets() {
   return referencedAssets;
}


  • A Parameter should not allow setting both the “value” and the “referenced assets”. As such, we will need a new Constructor to allow setting Referenced Assets instead of a value. At the moment, however, Parameter already has several different overrides for the Constructor. We do not want to perpetuate this pattern, adding several additional overrides. Instead, the Parameter class should be updated to use a Builder pattern.


Existing Extensions

—----------------------------

As noted above, the change to Parameter will affect any ParameterProvider. All implementations of ParameterProvider will need to be updated to use the new Builder pattern to construct Parameters.


Framework API

—-----------------------

The Framework API will contain a new AssetManager interface, along the lines of:

public interface AssetManager {
	void initialize(AssetManagerInitializationContext context);

	Asset createAsset(String parameterContextId, String assetName, InputStream contents) throws IOException;

	Optional<Asset> getAsset(String id);

	Optional<Asset> deleteAsset(String id);
}




Framework
---------

The Framework will provide implementations for StandardAsset, StandardAssetManager.

The StandardAssetManager will write the given InputStream to a file on disk in a configured location. The ParameterContextResource will be responsible for replicating the file to all nodes in the cluster.

The Framework will be responsible for also instantiating the appropriate AssetManager based on the configuration in nifi.properties.


UI


As noted above, the UI will incorporate the ability to upload one or more assets as the value for a given Parameter.

Additionally, it will allow uploading a file directly from the property configuration dialog (or a popup from that dialog) just as it does when creating a Parameter for a given property.


Serialization / Synchronization
-------------------------------

The NiFIRegistryFlowMapper and VersionedComponentSynchronizer will need to account for these changes to Parameter and serialize/deserialize/synchronize appropriately.


StandardFlowComparator will need to be updated to indicate changes to assets as flow changes.

System Tests
------------

We should have system tests for the following cases, at minimum:

  • Uploading an Asset and using it.
  • Removing all references to an Asset and ensuring that the asset is removed from the file system.
  • Restarting NiFi and ensuring that references to an asset still work
  • All of the above should be tested in both Standalone and Clustered modes


Potential Future Improvements

The following concepts have been considered. While there may be benefits to incorporating such features in the future, they are considered out of scope for the initial work.

  • Synchronizing Assets across cluster in event that nodes are added to a cluster
  • Support for Assets using Sensitive Parameters
  • Uploading of “directories” or archive formats that expand into directories, such as .tar files


  • No labels

3 Comments

  1. Thanks for the feature proposal Mark Payne - I have a few comments / questions:

    • Do we want to have some limitations in terms of how large an asset can be?
    • Do we want to put in place some filters in terms of allowed extensions for the files that would get uploaded?
    • I imagine that NiFi will be configured with some local path as to where the assets will be uploaded on the NiFi hosts (right?). Will this path be configurable per parameter context or globally? I think, at the very least, each parameter context should have a subdirectory in the destination to avoid any file name conflict across multiple parameter contexts in case NiFi is used in a multitenant way. Thoughts? If there is already a file with the same name will we overwrite the file or send back an error? or would we overwrite only if updating the asset of a given parameter with an asset having the same name?
    • I understand that we don't want to support syncing assets across the cluster which means that adding a node to the cluster may have bad effects on the running flow. I imagine the scenario where one could configure the directory where files would be uploaded as a network attached storage across the nodes. In that case, I think it should be possible to say that the implementation does not require the upload request to be replicated across the nodes and uploading on one node is enough to make the data available to all nodes. This would also the preferred approach for elastic clusters with k8s where NiFi pods are sharing a common volume for this specific type of files for example. Thoughts?
  2. Hey Pierre Villard thanks, good questions. 

    I do not believe we should add any sort of filters / limitations around what types of files / how large files are. In order to upload assets, it will require that you have appropriate permissions to do so. I don't see an advantage in adding restrictions there.

    In terms of where the files live and updating - my thought is that there will be a single directory that is configured, and each Parameter Context will get its own subdirectory, yes. If you upload another asset with the same name, and for the same parameter context, it would simply overwrite the existing one. You could potentially upload a foo.txt  to Context A and another foo.txt  to Context B. Those would be separate Assets.

    I think, at least initially, I would avoid adding in the ability to disable replication across the cluster, in order to use a shared drive. I feel like this can lead to a lot of confusion, and any time you start writing to shared drives, it introduces a whole new set of problems. Especially for something like NFS where the writer may not be done writing (or may have finished writing but not finished fully flushing), and readers can then read but will get back NUL  characters in the files, and all sorts of problems ensue. Because of this, I think it makes sense to avoid trying to tackle that now. Writing to a local file system addresses a big issue very well for static clusters. For more dynamic clusters like auto-scaling K8s, it won't fully solve the issue initially. But I think it makes sense to start small and grow the feature from there.

  3. I think an interesting possibility here would be to allow a NiFi node to reach out and download an asset from a central server.  The asset would not be uploaded by a user, but the user would instead configure a URL to where the asset is located.

    • This would allow for synchronizing assets when adding a node to a cluster, because the new node could retrieve all assets that it needs.
    • This could apply to a K8s environment, when a new NiFi pod starts up it could retrieve assets on startup.
    • The asset configuration could also be configured to refresh to pick up changes from the central server, perhaps by including a CRON schedule where all nodes attempt to refresh the asset at the same time.