You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Target release
Epic NIFI-1857 - Getting issue details... STATUS
Document statusDRAFT
Document owner

Koji Kawamura

Designer
Developers
QA

Goals

  • Exchange flow files between two NiFi environments using Site-to-Site via HTTP(S) 

Background and strategic fit

Some environments only allow network communication through HTTP(S) port, typically with multi-datacenter deployments. In order to exchange data between NiFi environments using Site-to-Site in such restricted deployments, we should add HTTP(S) as a transport protocol for Site-to-Site.

Assumptions

Requirements

#TitleUser StoryImportanceNotes
1Minimize required network pots to go through FirewallThe target NiFi server only allows access for HTTP/HTTPS. Raw Socket Site-to-Site requires additional port (typically 9990).Must Have

To minimize required open ports, the new HTTP endpoints are added under /nifi-api/site-to-site, using the same port with the existing NiFi API.

2Selectable Transport protocolA DFM can select transport protocol to use from NiFi Web UI. Available protocols are 'RAW' and 'HTTP'.Must Have 
3Support HTTPS and authThe network communications can be secured by HTTPS. When to do so, use source NiFi sends its certificate and target NiFi validates if it is registered within a trust store.Must Have 
4Support HTTP ProxyTo reach the target NiFi all communications have to go through a HTTP Proxy server.Must HaveThere is an existing JIRA issue to allow enabling security per Port NIFI-304 , but this proposal doesn't address it, provide security per server basis as Raw Socket does.
5Same level of transaction characteristics as RAW Socket

For the flow-files transferred from NiFi-A to NiFi-B, the transaction should be committed on NiFi-A and NiFi-B, only if NiFi-A confirms that NiFi-B received the all sent data intact.

Similar for flow-files retrieval operation. Details are described below.

Must Have 
6Same level of port availability check as RAW Socket

The availability of data transport should be the same as RAW socket such as followings:

  • If the target port doesn't exist
  • If the target port is not running
  • If the target ports destination is full

If the target port is not validated, then the peer (a host owning the port) should be penalized for a while to let other peers to be used.

Must Have 
7Load balancingLoad balancing capability same as RAW Socket should be provided. The target port which has more data in its queue will receive less than others for sending flow-files, and it will be pulled more often than others for receiving.Must Have 
8Follow target NiFi environment topology changeIf target NiFi cluster add/remove nodes and its topology changed, then the source NiFi environment should be able to detect the change automatically, meaning be able to use newly added nodes, or stop sending requests to removed nodes.Must Have 
9Protocol version management

In order to provide backward compatibility in the future, the client and server component should negotiate protocol version, and downgrade its behavior when counter part only supports old version.

RAW Socket implementation already has protocol versions from 1 to 5 as of this writing. In order to let HTTP transport protocol version improve independently, yet reuse the existing same logic with Socket impl, this proposal uses two protocol versions, 'transport protocol version' and 'transaction protocol version'.

Must Have

Since this is the 1st timing to introduce HTTP Site-to-Site protocol:

transport protocol ver: 1

transaction protocol ver: 5

10Batch up file transport Must Have 
11Compression Must Have 
12Cluster aware endpoints Must Have 

User interaction and design

REST endpoints

Following REST endpoints will be added by this proposal:

  • /site-to-site/
    • GET: Returns required information of Site-to-Site for the source NiFi environment. Representing Controller of target NiFi environment.
  • /site-to-site/peers/
    • GET: Returns available peers of this NiFi environment.
  • /site-to-site/input-ports/{portId}/transactions/
    • POST: Initiate new transaction to send data from source to target NiFi. A new transaction id is published and returned.
  • /site-to-site/input-ports/{portId}/transactions/{transactionId}
    • POST: Transfer data from source to target NiFi. The transaction will be held on server side instead of commit it immediately, in order to provide 2-phase style commit. Returns Checksum calculated on server side.
    • DELETE: Commit the transaction which is held on server side.
  • /site-to-site/output-ports/{portId}/transactions/
    • POST: Initiate new transaction to receive data from target to source NiFi. A new transaction id is published and returned.
  • /site-to-site/output-ports/{portId}/transactions/{transactionId}
    • GET: Transfer data from target to source NiFi.  The transaction will be held on server side instead of commit it immediately, in order to provide 2-phase style commit.
    • DELETE: Commit the transaction which is held on server side. Client sends a Checksum calculated on client side.

 

Refer sequence diagrams 'REST interactions and scenarios' below for the details of how these REST endpoints are used.

REST interaction sequence diagrams

Here are sequence diagrams describing complete HTTP request/response sequence of successful scenario for both input-ports and output-ports. Other semi-normal and error cases are described in Interaction Scenarios.

The A_component in the diagrams is a component which uses SiteToSiteClient class, such as:

  • NiFiReceiver for Apache Spark
  • NiFiBolt and NiFiSpoutReceiver for Apache Storm
  • StandardRemoteGroupPort, this is the Remote Process Group processor in a NiFi data flow

 

input-ports/

The input-ports endpoint is used for sending data from source NiFi to target NiFi. When a transaction is created, a HTTP POST request is created towards a transactionId-1. Then while there is more data packet to be sent, the same HTTP POST request will be used, to send data in streaming manner. The POST request also returns the final transaction location, which is transactionId-2. When the client finishes sending all data packets to output stream, it flushes the stream. Finally, the client send a DELETE request to complete the transaction towards transactionId-2.

  A_component A_component HttpClient HttpClient HttpClientTransaction HttpClientTransaction SiteToSiteRestApiUtil SiteToSiteRestApiUtil SiteToSiteResource SiteToSiteResource createTransaction initiateTransaction POST /site-to-site/input-ports/{portId}/transactions transactionUrl-1, transactionProtocolVersion new state = TRANSACTION_STARTED openConnectionForSend POST /site-to-site/input-ports/{portId}/transactions/{transactionId-1} Transaction alt[while there is data packet to send] send writes data to outputstream state = DATA_EXCHANGED confirm finishTransferFlowFiles 201 Created: transactionId-2, serverChecksum validate server Checksum state = TRANSACTION_CONFIRMED complete commitTransferFlowFiles DELETE /site-to-site/input-ports/{portId}/transactions/{transactionId-2} 200 OK state = TRANSACTION_COMPLETED  

output-ports/

The output-ports endpoint is used for receiving data from target NiFi to source NiFi. When a transaction is created, a HTTP GET request is created towards a transactionId-1. Then while there is more data packet to be received, the same HTTP GET request will be used, to receive data in streaming manner. The GET request also returns the final transaction location, which is transactionId-2. When the client finishes consuming all data packets from input stream, and confirm() method is called, the client send a DELETE request with Checksum to complete the transaction.

The complete() method doesn't do anything other than update state to TRANSACTION_COMPLETED. A_component A_component HttpClient HttpClient HttpClientTransaction HttpClientTransaction SiteToSiteRestApiUtil SiteToSiteRestApiUtil SiteToSiteResource SiteToSiteResource createTransaction initiateTransaction POST /site-to-site/output-ports/{portId}/transactions transactionUrl-1, transactionProtocolVersion new state = TRANSACTION_STARTED openConnectionForReceive GET /site-to-site/output-ports/{portId}/transactions/{transactionId-1} 201 Created: transactionUrl-2 Transaction alt[while there is data packet to receive] receive read from inputstream state = DATA_EXCHANGED data packet confirm commitReceivingFlowFiles(checksum) DELETE /site-to-site/output-ports/{portId}/transactions/{transactionId-2} validate client Checksum 200 OK state = TRANSACTION_CONFIRMED complete state = TRANSACTION_COMPLETED  

 

REST interactions and scenarios

input-ports/

Scenario Type

{portId}/transactions

{portId}/transactions/{transactionId}{portId}/transactions/{transactionId}
Transaction Initiation Failure
  • Client sends a POST with handshake parameters
  • Server responds :
    • 400 Bad Request: if the protocol version is not specified, or not supported
    • 401 Unauthorized : If request is not authorized to the port
    • 403 Forbidden: if it is a NiFi Cluster Manager
    • 404 Not Found: if port is not found with portId
    • 503 Service unavailable : If the port is not running, not valid state, or port's destination is full
N/AN/A
Normal Case
  • Client sends POST with handshake parameters
  • Server creates a receiving transaction
  • Server responds:
    • 201 Created: The created transaction's URL is returned with 'Location' header
  • Client sends a POST to a receiving transaction
  • Client streams data packets
  • Client status -> DATA_EXCHANGED
  • Server removes receiving transaction
  • Server consumes input stream to receive incoming data packets
  • Server creates a holding transaction
  • Server responds 201 Created: The created transaction's URL is returned with 'Location' header, and Checksum is returned with response body
  • Client checks client Checksum and server Checksum
  • If Checksums are identical
  • Client state -> TRANSACTION_CONFIRMED
  • Client sends a DELETE to a holding transaction with CONFIRM_TRANSACTION
  • Server commits its session
  • Server responds 200 OK with TRANSACTION_FINISHED
  • Client state -> TRANSACTION_COMPLETED
Normal Case - Destination becomes full(after above interactions)

(after above interactions)

(branched from above interactions)

  • Server responds 200 OK with TRANSACTION_FINISHED _BUT_DESTINATION_FULL if there's no available relationships
  • Client state -> TRANSACTION_COMPLETED
  • Client panalize peer. TODO: only implemented in Socket, EndpointConnectionPool.java
BAD_CHECKSUM(after above interactions)

(after above interactions)

  • Client checks client Checksum and server Checksum
  • If Checksums are not identical
  • Client sends a DELETE to a holding transaction with BAD_CHECKSUM
  • Client rollbacks its session
  • Server rollbacks its session
  • Server responds 200 OK with CANCEL_TRANSACTION
Cancel transaction(after above interactions)

(after above interactions)

  • Client cancel the transaction for some reason (There's no component which cancels a transaction as of this writing, though)
  • TODO: not implemented yet
  • Client sends a DELETE to a holding transaction with CANCEL_TRANSACTION
  • Client status -> TRANSACTION_CANCELED
Defunct transaction(after above interactions)(after above interactions)
  • Client stops working
  • After passing certain amount of time, the receiving transaction will be removed from server side
  • Transaction on client side might be rollbacked since it's not working properly
  • Transaction on server should be rollbacked

Expired transaction

(after above interactions)(after above interactions)
  • Client sends a DELETE to a holding transaction
  • Server responds 404 Not Found, if the receiving transaction is already removed because it's expired
  • Transaction on client should be rollbacked
  • Transaction on server has already rollbacked
Transaction Initiation Failure(after above interactions)(after above interactions)
  • Client sends a DELETE to a holding transaction
  • It's possible for the server to respond with 400, 401, 403, 404 or 503 if its state has changed from the previous request
  • Client rollbacks its session
  • Transaction on server should be rollbacked
Defunct transaction(after above interactions)
  • Client stops working
  • After passing certain amount of time, the receiving transaction will be removed from server side
N/A
Expired transaction(after above interactions)
  • Client sends a POST to a receiving transaction
  • Server responds 404 Not Found, if the receiving transaction is already removed because it's expired
N/A
Transaction Initiation Failure(after above interactions)
  • Client sends a POST to a receiving transaction
  • It's possible for the server to respond with 400, 401, 403, 404 or 503 if its state has changed from the previous request
  • Client rollbacks its session
  • Server session hasn't started yet
N/A

 

output-ports/

Scenario Type

{portId}/transactions

{portId}/transactions/{transactionId}{portId}/transactions/{transactionId}
Transaction Initiation Failure
  • same implemetation with input-ports
N/AN/A
Normal Case
  • Client sends POST with handshake parameters
  • Server creates a receiving transaction
  • Server responds:
    • 201 Created: The created transaction's URL is returned with 'Location' header
  • Client sends a GET to a transferring transaction
  • Client status -> DATA_EXCHANGED
  • Server removes transferring transaction
  • Server creates a holding transaction
  • Server responds 201 Created: The created transaction's URL is returned with 'Location' header
  • Client consumes input stream to receive incoming data packets
  • Server writes data packets into output stream

-- confirm()

  • Client calculate Checksum of received data
  • Client sends a DELETE to a holding transaction with Checksum
  • Server checks client Checksum and server Checksum
  • If checksums are identical
  • Server commits its session
  • Server responds 200 OK with TRANSACTION_FINISHED
  • Client state -> TRANSACTION_CONFIRMED

-- complete()

  • Client state -> TRANSACTION_COMPLETED
  • TODO Destination Full case is not implemented yet. Another roundtrip is required??? but there's no client side impl sending TRANSACTION_FINISHED _BUT_DESTINATION_FULL. Besides that, server side desn't have to penalize a peer, does it?
Normal Case - Destination becomes full(after above interactions)

(after above interactions)

(branched from above interactions)

  • Server responds 200 OK with TRANSACTION_FINISHED _BUT_DESTINATION_FULL if there's no available relationships
  • Client state -> TRANSACTION_COMPLETED
  • Client panalize peer. TODO: only implemented in Socket, EndpointConnectionPool.java
BAD_CHECKSUM(after above interactions)

(after above interactions)

  • Client checks client Checksum and server Checksum
  • If Checksums are not identical
  • Client sends a DELETE to a holding transaction with BAD_CHECKSUM
  • Client rollbacks its session
  • Server rollbacks its session
  • Server responds 200 OK with CANCEL_TRANSACTION
Cancel transaction(after above interactions)

(after above interactions)

  • Client cancel the transaction for some reason (There's no component which cancels a transaction as of this writing, though)
  • TODO: not implemented yet
  • Client sends a DELETE to a holding transaction with CANCEL_TRANSACTION
  • Client status -> TRANSACTION_CANCELED
Defunct transaction(after above interactions)(after above interactions)
  • Client stops working
  • After passing certain amount of time, the receiving transaction will be removed from server side
  • Transaction on client side might be rollbacked since it's not working properly
  • Transaction on server should be rollbacked

Expired transaction

(after above interactions)(after above interactions)
  • Client sends a DELETE to a holding transaction
  • Server responds 404 Not Found, if the receiving transaction is already removed because it's expired
  • Transaction on client should be rollbacked
  • Transaction on server has already rollbacked
Transaction Initiation Failure(after above interactions)(after above interactions)
  • Client sends a DELETE to a holding transaction
  • It's possible for the server to respond with 400, 401, 403, 404 or 503 if its state has changed from the previous request
  • Client rollbacks its session
  • Transaction on server should be rollbacked
Defunct transaction(after above interactions)
  • Client stops working
  • After passing certain amount of time, the receiving transaction will be removed from server side
N/A
Expired transaction(after above interactions)
  • Client sends a GET to a transferring transaction
  • Server responds 404 Not Found, if the transfering transaction is already removed because it's expired
N/A
Transaction Initiation Failure(after above interactions)
  • Client sends a POST to a transferring transaction
  • It's possible for the server to respond with 400, 401, 403, 404 or 503 if its state has changed from the previous request
  • Client rollbacks its session
  • Server session hasn't started yet
N/A

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome

Not Doing

  • No labels