Note: This document is considered a living document and will evolve as the community guides the development of the effort moving forward. The initial content is based heavily around the Java agent and scope will later grow to increase other agents.

Project Structure

Given the commonalities between the MiNiFi effort and that of NiFi, a similar structure will be provided for the Java version, as a Maven project, comprised of:

minifi-api
minifi-bootstrap
minifi-commons
minifi-framework
minifi-assembly

Guiding Points of Design

Installation, provision, and establishing of agents
Upgrading agents, inclusive of both functionality in terms of flow/processing, and the agent binary and associated libraries
- Tolerance for asynchronous and optimistic upgrades

Realization of agent capabilities
- Possible taxonomy of agents and functionalities that may be driven by available software and hardware

Mediating different agent versions
Observation of status
Security
Provenance Generation and Transmission to some endpoint store Considerations for supporting replay and buffering of data
Prioritization of Data with Back Pressure and Pressure Release
Supported Operating Systems and the requirements levied:
- Windows,
- Mac,
- Linux, and
- Unix

Networking and communication protocols
- Traversal of non-direct network hops and relay functionality
Manager provides a consistent user experience for agent groups as it does processors

Guidelines for MiNiFi C++

The following guidelines represent a list of guidelines that may be useful for MiNiFI C++ development and design. These stem from the environments which
are potential targets. Their basis stems from typical C++ development practices and design goals that lend toward maximizing development efforts.

Minimize Memory footprint and management of memory within modules.
Limit data and access patterns that are deemed risky
Limit failure with testing at all stages; however, when testing occurs we should aim for recoverability.
We should follow the open/closed principle to avoid churn and support changes as they occur.
Try to be a good cog – Problems will occur on devices and thus we can’t trust anything, even a malloc

Components

Installation

Executable that is command line driven
Installable as a service

Initialization

Establishes a two process mechanism similar to that of existing NiFi:
1. Bootstrap Process: controls the instantiation and execution of the flow process and aids in receiving configuration changes (products of design and deploy approach)
2. Flow Process: handles the actual collection and transmission of data
Makes use of a configured state to drive the process of starting a flow, this should be extensible to allow various implementations of inputs

Key	Summary	T	Created	Updated	Due	Assignee	Reporter	P	Status	Resolution

Loading...

Refresh

Registration/Announcement

Agents will have a defined taxonomy and capabilities associated with them. These properties will aid in the agent being able to communicate what items are possible and aid flow designers in the process of creating flows for various agent classes. Said capabilities will be communicated with a manager for the sake of understanding what is possible with various agents. Capabilities and capacities may change over time and this information will be continually registered with associated systems

Agent Classes

Longer term, agents should be able to convey their capabilities as a result of items such as environment, version of software, networking, and hardware for establishing configuration of flow and collected data from a manager perspective.

Configuration - Bootstrap Agent Executable

Primarily handles the bootstrapping of the process and the configuration of the JVM which is monitoring and controlling the flow process. This will receive configuration changes and affect the associated flow process to provide these updates.

Interfaces

ConfigurationChangeListener - Provides the handling of updates to the agent from an external source
- In the simplest case, this would be evaluating changes to a configuration file

Configuration - Processing Flow

Design and Deploy driven where the associated flow is provided via the bootstrap process

Key	Summary	T	Created	Updated	Due	Assignee	Reporter	P	Status	Resolution

Loading...

Refresh

Data Format

The FlowFile format has been the core serialization format of NiFi and provides structure that allow for ease of files traversing a given flow and exploit pass by reference semantics in routing operations. Of interest is the handling of information with the core FlowFile format as metadata is transmitted from the agent to a receiving node/system. This may be out of band or as an augmentation to the FlowFIle format.

Provenance

Agent Statistics/Heartbeat

Key	Summary	T	Created	Updated	Due	Assignee	Reporter	P	Status	Resolution

Loading...

Refresh

Data Ingress

Provides a means for introducing data into the system and currently maps data to existing processors in the system. Given the desire to make use of existing libraries and functionalities when developing the initial agent offering, focus will be provided to the core use cases, mapping to existing processors, this would be comprised of:

Files (Tail, Get)
Logs (Listed Syslog, UDP

Data Egress

Egress is viewed as high level terminology for getting data from an agent to an associated system. The complexity and needs for this functionality may vary across environments and may have complex networking schemes required

Communication and Protocols

For the existing proof of concept and for establishing an agent to make larger architectural decisions, the Java agent can make use of the existing Site to Site protocol and functionality to communicate with an endpoint system.

Key	Summary	T	Created	Updated	Due	Assignee	Reporter	P	Status	Resolution

Loading...

Refresh

Glossary

Agent

A lightweight process, capable of being constructed for acquiring information from a host system(s) and providing this information to another system for consumption. This process provides provenance, a directed graph of processing, and extensibility to map to various data formats, schemas, and protocols.

Capability

Functionality that a given agent is able to perform. In some contexts, this may be communicating with specific devices, handling a certain nature or complexity of data, compute power, or serving specific roles in the data ingress/egress process from generation to consumer

Class

An aggregation of one or more capabilities that allows specific agents to carry out a given processing graph. For example, a high-level view of a "File Forwarder" class would require the capability to both interact with the file system to get files and additionally have one or more egress methods to return information to a desired consumer

Egress

A generic term for providing information from an agent to one or more consumers. In simplest form, this is a direct line through networking to send data to a desired target. In more complex environments, this may require an n-hop network relying on several other agents to relay the data throughout the network traversal.

Space shortcuts

Child pages

Project Structure

Guiding Points of Design

Guidelines for MiNiFi C++

Components

Installation

Initialization

Registration/Announcement

Agent Classes

Configuration - Bootstrap Agent Executable

Interfaces

Configuration - Processing Flow

Data Format

Provenance

Agent Statistics/Heartbeat

Data Ingress

Data Egress

Communication and Protocols

Glossary

Agent

Capability

Class

Egress