This information is useful if a host fails that is hosting slave components (such as NodeManager, DataNode). I performed these steps using Ambari 1.5.0.
I have an HDP 2.0 cluster with Ambari Server running on c6401.ambari.apache.org. In that cluster, I have a host (c6404.ambari.apache.org) that is hosting DataNode, NodeManager, RegionServer and Ganglia Monitor components. All is well, Agent is alive and heartbeating, components are running, no alerts are present.
I lose my host (hardware failure)! Nagios starts alerting because components on that host are no longer running, and we lose the Agent heartbeat. Ambari Web shows the alerts and that the heartbeat is lost.
I repair + rebuild the machine and use the SAME hostname c6404.ambari.apache.org. I install the Ambari Agent on the machine, set the Ambari Server to point c6401.ambari.apache.org and start the agent. Ambari picks-up the heartbeat and starts reporting that the components are “not running”.
I need to get Ambari to re-install the component packages on the machine so I can restart the components and get back into a good state.
How to Re-install a Component via Ambari Web UI
Note: for the Ganglia Monitor component, the delete/add options are not available in the Web UI. Therefore, you must use the Ambari REST API to re-install (see below).
- From the “actions” menu next to the component, select “Delete”.
- Click the “+ Add“ button and you’ll see the component is listed. Select the component to install.
- The component will be installed. Once complete, select “Start” and you are back in business.
How to Re-install a Component via Ambari REST API
Using your favorite REST Tool (curl, RESTClient, GraphicalHttpClient), perform the following steps.
Note: in the example URLs below, we are using component “GANGLIA_MONITOR”. Replace with the appropriate host component (DATANODE, NODEMANAGER, HBASE_REGIONSERVER, etc).
Delete the host component.
Add the host component.
Install the host component by putting the host role state into “INSTALLED”.
Start the host component by putting the host role state into “STARTED”. Once complete, you are back in business.