Skip to end of metadata
Go to start of metadata

Apache NiFi provides users the ability to build very large and complex DataFlows using NiFi. This is achieved by using the basic components: Processor, Funnel, Input/Output Port, Process Group, and Remote Process Group. These can be thought of as the most basic building blocks for constructing a DataFlow. At times, though, using these small building blocks can become tedious if the same logic needs to be repeated several times. To solve this issue, NiFi provides the concept of a Template. A Template is a way of combining these basic building blocks into larger building blocks. Once a DataFlow has been created, parts of it can be formed into a Template. This Template can then be dragged onto the canvas, or can be exported as an XML file and shared with others. Templates received from others can then be imported into an instance of NiFi and dragged onto the canvas.

For more information on Templates, including how to import, export, and work with them, please see the Template Section of the User Guide

Here, we have a collection of useful templates for learning about how to build DataFlows with the existing Processors. Please feel free to add any useful templates below.

TemplateDescriptionMinimum NiFi VersionProcessors Used
Pull_from_Twitter_Garden_Hose.xmlThis flow pulls from Twitter using the garden hose setting; it pulls out some basic attributes from the Json and then routes only those items that are actually tweets.  
Retry_Count_Loop.xmlThis process group can be used to maintain a count of how many times a flowfile goes through it. If it reaches some configured threshold it will route to a 'Limit Exceeded' relationship otherwise it will route to 'retry'. Great for processes which you only want to run X number of times before you give up.  
simple-httpget-route.template.xmlPulls from a web service (example is nifi itself), extracts text from a specific section, makes a routing decision on that extracted value, prepares to write to disk using PutFile.  
InvokeHttp_And_Route_Original_On_Status.xmlThis flow demonstrates how to call an HTTP service based on an incoming FlowFile, and route the original FlowFile based on the status code returned from the invocation. In this example, every 30 seconds a FlowFile is produced, an attribute is added to the FlowFile that sets q=nifi, the google.com is invoked for that FlowFile, and any response with a 200 is routed to a relationship called 200.  
Decompression_Circular_Flow.xmlThis flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out.  

SplitRouteMerge.xml

sample-input.txt

This flow demonstrates splitting a file on line boundaries, routing the splits based on a regex in the content, merging the less important files together for storage somewhere, and sending the higher priority files down another path to take immediate action.  
TwitterSolr.xml

This flow shows how to index tweets with Solr using NiFi. Pre-requisites for this flow are NiFi 0.3.0 or later, the creation of a Twitter application, and a running instance of Solr 5.1 or later with a tweets collection:

./bin/solr start -c
./bin/solr create_collection -c tweets -d data_driven_schema_configs -shards 1 -replicationFactor 1
  
CsvToJSON.xmlThis flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText.  
NetworkActvityExample.xmlThis flow grabs network activity using tcpdump, then performs geo-enrichment if possible, before delivering the tcpdump entries to Kafka and HDFS.  
SyslogExample.xmlThis flow shows how to send and receive messages from Syslog. It requires a Syslog server to be accepting incoming connections using the protocol and port specified in PutSyslog, and forwarding connections using the protocol and port specified in ListenSyslog. NOTE: This template can be used with the latest code from master, or when 0.4.0 is released0.4.0PutSyslog, ListenSyslog
Working_With_CSV.xml

This flow uses http://randomuser.me to generate random data about people in CSV format. It then manipulates the data and writes it to a directory.

A second flow then uses ListFile / FetchFile processors to pull that data into the flow, strip off the CSV header line, and groups the data into separate FlowFiles based on the first column of each row in the CSV file (the "gender" column) and finally puts all of the data to Apache Kafka, using the gender as part of the name of the topic.

0.4.0ListFile, FetchFile, PutKafka, RouteText, PutFile, ReplaceText, InvokeHTTP
Working_with_Logs.xml

Tails the nifi-app and nifi-user log files, and then uses Site-to-Site to push out any changes to those logs to remote instance of NiFi (this template pushes them to localhost so that it is reusable).

A second flow then exposes Input Ports to receive the log data via Site-to-Site. Then data is then aggregated until the data for a single log is in the range of 64-128 MB or 5 minutes passes, which occurs first. The aggregated log data is then pushed to a directory in HDFS, based on the current timestamp and the type of log file (e.g., pushed to /data/logs/nifi-app-logs/2015/12/03 or /data/logs/nifi-user-logs/2015/12/03, depending on the type of data).

NOTE: In order to use this template Site-to-Site must be enabled on the node. To do this, open the $NIFI_HOME/conf/nifi.properties file and set the "nifi.remote.input.socket.port" property to some open port number and set "nifi.remote.input.secure" to "false" (unless, of course, you are running in a secure environment). For more information on Site-to-Site, see the Site-to-Site Section of the User Guide.

0.4.0TailFile, MergeContent, PutHDFS, UpdateAttribute, Site-to-Site, Remote Process Group, Input Ports
Fun_with_HBase.xml

Downloads randomly generated user data from http://randomuser.me and then pushes the data into HBase. The data is pulled in 1,000 records at a time and then split into individual records. The incoming data is in JSON Format. The entire JSON document is pushed into a table cell named "user_full", keyed by the Row Identifier that is the user's Social Security Number, which is extracted from the JSON. Next, the user's first and last names and e-mail address are extract from the JSON into FlowFile Attributes and the content is modified to become a new JSON document consisting of only 4 fields: ssn, firstName, lastName, email. Finally, this smaller JSON is then pushed to HBase as a single row, each value being a separate column in that row.

At the same time, a GetHBase Processor is used to listen for changes to the Users table. Each time that a row in the Users table is changed, the row is pushed to Kafka as JSON.

NOTE: In order to use this template, there are a few pre-requisites. First, you need a table created in HBase with column family 'cf' and table name 'Users' (This can be done in HBase Shell with the command: create 'Users', 'cf'). After adding the template to your graph, you will need to configure the controller services used to interact with HBase so that they point to your HBase cluster appropriately. You will also need to create a Distributed Map Cache Server controller Service (all of the default values should be fine). Finally, each of the Controller Services needs to be enabled.

0.4.0InvokeHTTP, SplitJson, EvaluateJsonPath, AttributesToJson, PutHBaseCell, PutHBaseJSON, GetHBase, PutKafka
Hello_NiFi_Web_Service.xmlAn Ad Hoc web service that "enriches" an HTTP request to port 8011 with a NiFi greeting utilizing HandleHttpRequest, HandleHttpResponse and the StandardHttpContextMap controller service 

HandleHttpRequest, HandleHttpResponse,

ReplaceText

StandardHttpContextMap

Syslog_HBase.xmlInserts Syslog messages to HBase. Requires creating a table in HBase: create 'syslog', {NAME => 'msg'}0.4.0ListenSyslog, AttributesToJson, PutHbaseJSON
GroovyJsonToJsonExample.xmlIllustrates the ExecuteScript processor using Groovy to perform JSON-to-JSON transformations0.5.0ExecuteScript
WebCrawler.xml

A template that takes in an initial seed URL scraps the contents of the site for more URLs. For each URL it will extract lines which match specific phrases ("nifi" in the example) email the FlowFile to an address.
Also it bundles together websites then compresses and puts them to a local folder.
Note: This template requires a DistributedMapCacheServer with default values to run. It is not included because at the time of creation there was no way to explicitly include a controller service with no processors referring to it.

0.5.0CompressContent, DetectDuplicate, ExtractText, GetHTTP, InvokeHTTP, LogAttribute, MergeContent, PutEmail, PutFile, RouteOnAttribute, RouteText, SplitText, UpdateAttribute
DateConversion.xmlThis flow demonstrates how to extract a date string from a FlowFile and then replace that date in the flow file with the same date in a new format.0.6.1ExtractText, ReplaceText
ConvertCSVtoCQL.xmlThis template describes a flow where a CSV file (whose filename and content) contributes to the fields in a Cassandra table is processed, then CQL statements are constructed and executed.1.0.0PutCassandraQL, InvokeScriptedProcessor, UpdateAttribute, SplitText, ExtractText, ReplaceText, ExecuteScript
ScriptedMapCacheExample.xmlThis template shows how to use ExecuteScript to populate and fetch from a DistributedMapCacheServer using a DistributedMapCacheClientService. Prior to NiFi 1.0.0 this had to be done manually using the map cache protocol, now the DistributedMapCacheClient can be used directly.1.0.0ExecuteScript

 

http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
  • No labels