Overview

We've had quite a few people start to download and play around with NiFi and quite a few starting to use it operationally.

We have also seen a number of contributors come onto the project, and we often get the same question: Where can I start?

When you look at a project like NiFi, one can imagine that there are quite a few moving parts, so understanding how

all of these parts fits together can in itself become pretty daunting.

This documentation is intended to be a very high level description of the components that make up NiFi and how those

components fit together. It is not intended to go into great detail into any of these components. There are other documents

that exist (and more to come) that delve a bit deeper into the design of some of these individual components. The goal of

this document is to help developers who are new to NiFi come up to speed on some of terminology and understand how

the different components that make up the platform interact with one another.

FlowFile

We will begin the discussion with the FlowFile. This is the abstraction that NiFi provides around a single piece of data.

A FlowFile may represent structured data, such as a JSON or XML message, or may represent unstructured data, such as

an image. A FlowFile is made up of two parts: content and attributes. The content is simply a stream of bytes, which can

represent any type of data. The attributes are key-value pairs that are associated with the data. These attributes provide

context along with the data, which allows the data to be efficiently routed and reasoned about without parsing the content.

Processor

This is the most commonly used component in NiFi and tends to be the easiest place for newcomers to jump in.

A Processor is a component that is responsible for bringing data into the system, pushing data out to other systems,

or performing some sort of enrichment, extraction, transformation, or routing logic. Common Design Patterns for

Processors are discussed in the Developer Guide.

Processor Node

The Processor Node is essentially a wrapper around a Processor and maintains state about the Processor itself. The Processor

Node is responsible for maintaining, among other things, state about a Processor's positioning on the graph, the configured

properties and settings of the Processor, its scheduled state, and the annotations that are used to describe the Processor.

By abstracting these things away from the Processor itself, we are able to ensure that the Processor is unable to change things

that it should not, such as the configured values for properties, as allowing a Processor to change this information can lead to

confusion. Additionally, it allows us to simplify the code required to create a Processor, as this state information is automatically

Space shortcuts

Child pages

Overview

FlowFile

Processor

Processor Node

Content Repository

FlowFile Repository

Provenance Repository

Process Session

Process Context

Process Scheduler

FlowFile Queue

Flow Controller

Cluster Manager

Authority Provider

*-Resources

NiFi Server

Space shortcuts

Child pages

How It All Fits Together

Overview

FlowFile

Processor

Processor Node

Content Repository

FlowFile Repository

Provenance Repository

Process Session

Process Context

Process Scheduler

FlowFile Queue

Flow Controller

Cluster Manager

Authority Provider

*-Resources

NiFi Server