Overview
We've had quite a few people start to download and play around with NiFi and quite a few starting to use it operationally.
We have also seen a number of contributors come onto the project, and we often get the same question: Where can I start?
When you look at a project like NiFi, one can imagine that there are quite a few moving parts, so understanding how
all of these parts fits together can in itself become pretty daunting.
This documentation is intended to be a very high level description of the components that make up NiFi and how those
components fit together. It is not intended to go into great detail into any of these components. There are other documents
that exist (and more to come) that delve a bit deeper into the design of some of these individual components. The goal of
this document is to help developers who are new to NiFi come up to speed on some of terminology and understand how
the different components that make up the platform interact with one another.
FlowFile
We will begin the discussion with the FlowFile. This is the abstraction that NiFi provides around a single piece of data.
A FlowFile may represent structured data, such as a JSON or XML message, or may represent unstructured data, such as
an image. A FlowFile is made up of two parts: content and attributes. The content is simply a stream of bytes, which can
represent any type of data. The attributes are key-value pairs that are associated with the data. These attributes provide
context along with the data, which allows the data to be efficiently routed and reasoned about without parsing the content.
Processor
This is the most commonly used component in NiFi and tends to be the easiest place for newcomers to jump in.
A Processor is a component that is responsible for bringing data into the system, pushing data out to other systems,
or performing some sort of enrichment, extraction, transformation, or routing logic. Common Design Patterns for
Processors are discussed in the Developer Guide.
Processor Node
The Processor Node is essentially a wrapper around a Processor and maintains state about the Processor itself. The Processor
Node is responsible for maintaining, among other things, state about a Processor's positioning on the graph, the configured
properties and settings of the Processor, its scheduled state, and the annotations that are used to describe the Processor.
By abstracting these things away from the Processor itself, we are able to ensure that the Processor is unable to change things
that it should not, such as the configured values for properties, as allowing a Processor to change this information can lead to
confusion. Additionally, it allows us to simplify the code required to create a Processor, as this state information is automatically
managed by the framework.
Content Repository
FlowFile Repository
Provenance Repository
Process Session
Process Context
Process Scheduler
FlowFile Queue
Flow Controller
Cluster Manager
Authority Provider
*-Resources
NiFi Server