This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Replace + Reinstall a Failed Host
Skip to end of metadata
Go to start of metadata

This information is useful if a host fails that is hosting slave components (such as NodeManager, DataNode). I performed these steps using Ambari 1.5.0.

I have an HDP 2.0 cluster with Ambari Server running on c6401.ambari.apache.org. In that cluster, I have a host (c6404.ambari.apache.org) that is hosting DataNode, NodeManager, RegionServer and Ganglia Monitor components. All is well, Agent is alive and heartbeating, components are running, no alerts are present.

I lose my host (hardware failure)! Nagios starts alerting because components on that host are no longer running, and we lose the Agent heartbeat. Ambari Web shows the alerts and that the heartbeat is lost.

I repair + rebuild the machine and use the SAME hostname c6404.ambari.apache.org. I install the Ambari Agent on the machine, set the Ambari Server to point c6401.ambari.apache.org and start the agent. Ambari picks-up the heartbeat and starts reporting that the components are “not running”.

I need to get Ambari to re-install the component packages on the machine so I can restart the components and get back into a good state.

How to Re-install a Component via Ambari Web UI

Note: for the Ganglia Monitor component, the delete/add options are not available in the Web UI. Therefore, you must use the Ambari REST API to re-install (see below). 

  1. From the “actions” menu next to the component, select “Delete”.
     

     
  2. Click the “+ Add“ button and you’ll see the component is listed. Select the component to install.
  3. The component will be installed. Once complete, select “Start” and you are back in business. 

How to Re-install a Component via Ambari REST API

 Using your favorite REST Tool (curl, RESTClient, GraphicalHttpClient), perform the following steps. 

 Note: in the example URLs below, we are using component “GANGLIA_MONITOR”. Replace with the appropriate host component (DATANODE, NODEMANAGER, HBASE_REGIONSERVER, etc).

  1. Delete the host component.

    DELETE
    http://c6401.ambari.apache.org:8080/api/v1/clusters/MyCluster/hosts/c6404.ambari.apache.org/host_components/GANGLIA_MONITOR


  2. Add the host component.

    POST
    http://c6401.ambari.apache.org:8080/api/v1/clusters/MyCluster/hosts/c6404.ambari.apache.org/host_components/GANGLIA_MONITOR


  3. Install the host component by putting the host role state into “INSTALLED”.

    {"RequestInfo":{"context":"Install"},"Body":{"HostRoles":{"state":"INSTALLED"}}}
    
    PUT
    http://c6401.ambari.apache.org:8080/api/v1/clusters/MyCluster/hosts/c6404.ambari.apache.org/host_components/GANGLIA_MONITOR


  4. Start the host component by putting the host role state into “STARTED”. Once complete, you are back in business.

    {"RequestInfo":{"context":"Start"},"Body":{"HostRoles":{"state":"STARTED"}}}
    
    PUT
    http://c6401.ambari.apache.org:8080/api/v1/clusters/MyCluster/hosts/c6404.ambari.apache.org/host_components/GANGLIA_MONITOR


  • No labels