Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Page properties
DRAFTFINAL
Target release
Epic
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyNIFI-1857
Document status
Status
title
Document owner

Koji Kawamura

Designer
Developers
QA

...

#TitleUser StoryImportanceNotes
1Minimize required network pots to go through FirewallThe target NiFi server only allows access for HTTP/HTTPS. Raw Socket Site-to-Site requires additional port (typically 9990).Must Have

To minimize required open ports, the new HTTP endpoints are added under /nifi-api/site-to-site, using the same port with the existing NiFi API.

2Selectable Transport protocolA DFM can select transport protocol to use from NiFi Web UI. Available protocols are 'RAW' and 'HTTP'.Must Have 
3Support HTTPS and authThe network communications can be secured by HTTPS. When to do so, use source NiFi sends its certificate and target NiFi validates if it is registered within a trust store.Must Have 
4Support HTTP ProxyTo reach the target NiFi all communications have to go through a HTTP Proxy server.Must HaveThere is an existing JIRA issue to allow enabling security per Port NIFI-304 , but this proposal doesn't address it, provide security per server basis as Raw Socket does.
5Same level of transaction characteristics as RAW Socket

For the flow-files transferred from NiFi-A to NiFi-B, the transaction should be committed on NiFi-A and NiFi-B, only if NiFi-A confirms that NiFi-B received the all sent data intact.

Similar for flow-files retrieval operation. Details are described below.

Must Have 
6Same level of port availability check as RAW Socket

The availability of data transport should be the same as RAW socket such as followings:

  • If the target port doesn't exist
  • If the target port is not running
  • If the target ports destination is full

If the target port is not validated, then the peer (a host owning the port) should be penalized for a while to let other peers to be used.

Must Have 
7Load balancingLoad balancing capability same as RAW Socket should be provided. The target port which has more data in its queue will receive less than others for sending flow-files, and it will be pulled more often than others for receiving. See Peer Selection for details.Must Have 
8Follow target NiFi environment topology changeIf target NiFi cluster add/remove nodes and its topology changed, then the source NiFi environment should be able to detect the change automatically, meaning be able to use newly added nodes, or stop sending requests to removed nodes.Must Have 
9Protocol version management

In order to provide backward compatibility in the future, the client and server component should negotiate protocol version, and downgrade its behavior when counter part only supports old version.

RAW Socket implementation already has protocol versions from 1 to 5 as of this writing. In order to let HTTP transport protocol version improve independently, yet reuse the existing same logic with Socket impl, this proposal uses two protocol versions, 'transport protocol version' and 'transaction protocol version'.

Must Have

Since this is the 1st timing to introduce HTTP Site-to-Site protocol:

transport protocol ver: 1

transaction protocol ver: 5

10Batch up multiple files transportThe batch transport mechanism is the same as RAW socket protocol. How NiFi controls batch count, size and duration can be specified by HTTP headers. Must Have 
11CompressionWhether compress data packets can be specified by a HTTP header, refer HTTP headers. Must Have 12Cluster topology aware endpoints Must Have 

User interaction and design

This proposal add new UI input in Remote Process Group configuration dialog as the following image:

Image AddedImage Added

 

 

  • Transport Protocol: defaults to RAW
  • HTTP Proxy server hostname: Specify the proxy server's hostname to use. If not specified, HTTP traffics are sent directly to the target NiFi instance.
  • HTTP Proxy server port: Specify the proxy server's port number, optional. If not specified, default port 80 will be used.
  • HTTP Proxy user: Specify an user name to connect to the proxy server, optional.
  • HTTP Proxy password: Specify an user password to connect to the proxy server, optional.

nifi.properties

This proposal uses following configurations in nifi.properties :

 keydefault valuedescription
 nifi.web.http.port8080 
 nifi.web.https.port(blank) 
renamed

nifi.remote.input.socket.host

nifi.remote.input.host

(blank)Specify a hostname with that clients can reach to this host. This will be used by both RAW socket and HTTP.
 nifi.remote.input.socket.port(blank)Specify a port number to listen. RAW socket Site-to-Site is enabled when this property is set.
 

nifi.remote.input.secure

true

If it is true, then both RAW socket and HTTP should be secured, hence HTTPS protocol will be used.

newnifi.remote.input.http.enabledfalsetrueSpecify true if HTTP Site-to-Site should be enabled on this host. This defaults to true, to use Site-to-Site without any property configuration.
newnifi.newnifi.remote.input.http.transaction.ttl30 secSpecify how long a transaction can live on server, measured from the point of transaction creation.

 

Deployment examples

The following diagrams illustrate some deployment options to describe key features, it isn't meant to limit the deployment patterns. Although, following diagrams only show a single Site-to-Site client server

...

,  the client can be one of a NiFi node within a NiFi cluster. Site-to-Site supports cluster to cluster data transport.

To Standalone NiFi : Socket

This is an existing deployment option using RAW Socket, here to describe the difference between HTTP Site-to-Site. It also supports secure communication, and NiFi cluster. With RAW Socket, it first retrieves the remote NiFi site info by sending a HTTP request to /nifi-api/site-to-site. After that, it uses Socket networking to exchange data.

Gliffy Diagram
nameNiFi-Site-to-Site-deployment-patterns

 

To Standalone NiFi : HTTP

If HTTP is used for Transport Protocol, then all communications between Site-to-Site client and the remote NiFi instance are done with HTTP protocol.

Gliffy Diagram
nameNiFi-Site-to-Site-deployment-http

To Standalone NiFi : HTTPS

Network traffic to a remote NiFi can be secured by setting nifi.remote.input.secure to true. When it's true, a remote NiFi instance is only accessible with HTTPS protocol.

Gliffy Diagram
nameNiFi-Site-to-Site-deployment-https

To Standalone NiFi : HTTP using Proxy

 If a remote NiFi instance is behind a firewall which only expose http port to a Proxy Server, its Site-to-Site client can be configured as shown in this diagram to use that proxy server.

Gliffy Diagram
nameNiFi-Site-to-Site-deployment-http-proxy

To NiFi Cluster : HTTP

If the target NiFi is a cluster, its client chooses which NiFi node to transport data based on Peer Selection each time when it transfers data, for example if the Site-to-Site client component is a Remote Process Group, it does peer selection when it's scheduled.

 

...

In order to allow a NiFi cluster to use HTTPS for Site-to-Site, but HTTP for communications within a cluster, siteToSiteHttpApiPort is added to NodeIdentifier. Because the existing apiPort is determined by if cluster protocol manager to node is secure.

Gliffy Diagram

 

TODO: add deployment diagram

 

nameNiFi-Site-to-Site-deployment-http-cluster

 

Anchor
peer-selection
peer-selection

Peer Selection

If the remote NiFi forms a cluster, a Site-to-Site client has to determine which NiFi node to transfer data to/from, let's call the decision making process as 'Peer Selection'. There're two aspects for that, Flow file count, and Port Status. A Site-to-Site client does Peer Selection when startTransaction method is called.

...

If a Site-to-Site client receives PORTS_DESTINATION_FULL, it only means that the port running on a particular NiFi node is full. So the client penalizes the peer, but continues looking for another peer. If all peers destination are full, then Site-to-Site client returns null as a return value for startTransaction method.

 

REST endpoints

Following REST endpoints will be added by this proposal:

  • /site-to-site/
    • GET: Returns required information of Site-to-Site for the source NiFi environment. Representing Controller of target NiFi environment.
  • /site-to-site/peers/
    • GET: Returns available peers of this NiFi environment.
  • /site-to-site/input-ports/{portId}/transactions/
    • POST: Initiate new transaction to send data from source to target NiFi. A new transaction id is published and returned.
  • /site-to-site/input-ports/{portId}/transactions/{transactionId}{transactionId}
    • PUT: Extends the transaction's TTL, used to let server know the client still working
    • DELETE: Commit the transaction which is held on server side.
  • /site-to-site/input-ports/{portId}/transactions/{transactionId}/flow-files
    • POST: Transfer data from source to target NiFi. The transaction will be held on server side instead of commit it immediately, in order to provide 2-phase style commit. Returns Checksum calculated on server side.
  • /site-to-site/output-ports/{portId}/transactions/
    • POST: Initiate new transaction to receive data from target to source NiFi. A new transaction id is published and returned.
  • /site-to-site/output-ports/{portId}/transactions/{transactionId}
    • PUT: Extends the transaction's TTL, used to let server know the client still working
    • DELETE: Commit the transaction which is held on server side. Client sends a Checksum calculated on client side.
  • /site-to-site/output-ports/{portId}/transactions/{transactionId}/flow-files
    • GET: Transfer data from target to source NiFi.  The transaction will be held on server side instead of commit it immediately, in order to provide 2-phase style commit.

...

 

PlantUML
actor A_component
actor HttpClient
actor HttpClientTransaction
actor SiteToSiteRestApiUtilSiteToSiteRestApiClient
actor SiteToSiteResource

' comment: initialize
A_component -> HttpClient: createTransaction
HttpClient -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: initiateTransaction
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: POST /site-to-site/input-ports/{portId}/transactions
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: transactionUrl, transactionProtocolVersion
HttpClient <-- SiteToSiteRestApiUtilSiteToSiteRestApiClient
HttpClient -> HttpClientTransaction: new
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_STARTED
HttpClientTransaction -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: openConnectionForSend
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: POST /site-to-site/input-ports/{portId}/transactions/{transactionId}/flow-files
HttpClient <-- HttpClientTransaction
A_component <-- HttpClient: Transaction

'comment: receive
alt while there is data packet to send
    A_component -> HttpClientTransaction: send
	HttpClientTransaction -> SiteToSiteResource: writes data to outputstream
	HttpClientTransaction -> HttpClientTransaction: state = DATA_EXCHANGED
	A_component <-- HttpClientTransaction
end
 
'comment: confirm
A_component -> HttpClientTransaction: confirm
HttpClientTransaction -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: finishTransferFlowFiles
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: 202 Accepted: returns serverChecksum
HttpClientTransaction -> HttpClientTransaction: validate server Checksum
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_CONFIRMED
A_component <-- HttpClientTransaction

'comment: complete
A_component -> HttpClientTransaction: complete
HttpClientTransaction -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: commitTransferFlowFiles
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: DELETE /site-to-site/input-ports/{portId}/transactions/{transactionId}
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: 200 OK
HttpClientTransaction <-- SiteToSiteRestApiUtilSiteToSiteRestApiClient
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_COMPLETED
A_component <-- HttpClientTransaction
 

...

The complete() method doesn't do anything other than update state to TRANSACTION_COMPLETED.

PlantUML
actor A_component
actor HttpClient
actor HttpClientTransaction
actor SiteToSiteRestApiUtilSiteToSiteRestApiClient
actor SiteToSiteResource

' comment: initialize
A_component -> HttpClient: createTransaction
HttpClient -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: initiateTransaction
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: POST /site-to-site/output-ports/{portId}/transactions
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: transactionUrl, transactionProtocolVersion
HttpClient <-- SiteToSiteRestApiUtilSiteToSiteRestApiClient
HttpClient -> HttpClientTransaction: new
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_STARTED
HttpClientTransaction -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: openConnectionForReceive
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: GET /site-to-site/output-ports/{portId}/transactions/{transactionId}/flow-files
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: 202 Accepted
HttpClientTransaction <-- SiteToSiteRestApiUtilSiteToSiteRestApiClient
HttpClient <-- HttpClientTransaction
A_component <-- HttpClient: Transaction

'comment: receive
alt while there is data packet to receive
    A_component -> HttpClientTransaction: receive
	HttpClientTransaction <-- SiteToSiteResource: read from inputstream
	HttpClientTransaction -> HttpClientTransaction: state = DATA_EXCHANGED
	A_component <-- HttpClientTransaction: data packet
end
 
'comment: confirm
A_component -> HttpClientTransaction: confirm
HttpClientTransaction -> SiteToSiteRestApiUtilSiteToSiteRestApiClient: commitReceivingFlowFiles(checksum)
SiteToSiteRestApiUtilSiteToSiteRestApiClient -> SiteToSiteResource: DELETE /site-to-site/output-ports/{portId}/transactions/{transactionId}
SiteToSiteResource -> SiteToSiteResource: validate client Checksum
SiteToSiteRestApiUtilSiteToSiteRestApiClient <-- SiteToSiteResource: 200 OK
HttpClientTransaction <-- SiteToSiteRestApiUtilSiteToSiteRestApiClient
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_CONFIRMED
A_component <-- HttpClientTransaction
 
'comment: complete
A_component -> HttpClientTransaction: complete
HttpClientTransaction -> HttpClientTransaction: state = TRANSACTION_COMPLETED
A_component <-- HttpClientTransaction
 

...