Failure Policies are a necessary, but rarely codified component, of deployed software systems. Command and control necessitates a strict codified scheme for failure policies within agents.

The nomenclature will represent capabilities within Apache NiFi MiNiFi C++ agents. 

Policy NamePolicy Description
FAILFails the corresponding component. May yield failure and shutdown of the Agent

Attempts to recover the affected component. If recovery isn't successful, the policy may further define

routes that allow the component to transfer ownership to a new object/component.

RETRYAttempts to retry from the failure. May leave the agent's component in a loop
RETRY_RECOVERAttempts to retry the failing operation, recovery in the event that retry isn't successful.


Agents can be configured through the flow. This allows the flow to define a set of update policies that control what can be changed during runtime. FailurePolicies are controller services that are inherently

linked with command and control ( but do not require it to be running ), allowing real-time change ( if update policies allow ) how we recover from failure. Failure recovery can be defined by specific controls

within failure policies that allow us to recover by resetting state or instantiating new components to take over the operation

To simply the configuration, the initial set of FailurePolicies built will allow recovery of hardware failure for repositories and network connectivity. Network connectivity is typically controlled via backpressure; however,

if network failure occurs we may want some operations to continue, thus a failure policy can recover by clearing queues or prioritizing connections. 

Repositories can recover through flushing WALs, replacing WAL data, and/or converting to fully volatile repositories. 

Work for this effort will be completed in  MINIFICPP-545 - Getting issue details... STATUS

  • No labels