This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Dynamic HA Provider Configuration

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Move cluster-specific service HA configuration from the HaProvider configuration to the service declaration.
    • This just makes more logical sense
    • Moving cluster-specific details out of the provider configuration will make shared provider configurations applicable to more topologies (i.e., more reusable), across clusters.
    • The cluster-specific configuration could be discovered along with the service URL details, rather than having to be hard-coded in a provider configuration.
    • This will be treated as override configuration, so that the existing approach (i.e., complete configuration as HaProvider param values) will continue to work, to satisfy backward-compatibility requirements.

  • Support new enabled value "auto" to allow service-specific configuration to determine whether HA treatment is enabled for that service.

    Code Block
    languagexml
    titleExample HaProvider Configuration with enabled=auto
      <provider>
        <role>ha</role>
        <name>HaProvider</name>
        <enabled>true</enabled>
        <param>
          <name>WEBHDFS</name>
          <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=auto</value>
        </param>
      </provider>

    In this case, the topology generation will reference the dfs.ha.automatic-failover.enabled property in the HDFS configuration to determine whether the HaProvider should be enabled for WEBHDFS.

Open Items

  • Name the service-level override param for the service-specific HaProvider param component named 'enabled'
  • 'enabled'
  • 'haEnabled'
  • 'ha.enabled'
    Identify the nature of all the supported services
    • Which services are currently purely topology-based (DefaultURLManager)?
      • WEBHCAT
      • OOZIE
      • WEBHDFS
      • ?

    • Which services are currently ZooKeeper-based (BaseZooKeeperURLManager)?
      • HIVE (HS2ZooKeeperURLManager)
      • HBASE (HBaseZooKeeperURLManager)
      • Kafka (KafkaZooKeeperURLManager)
      • SOLR (SOLRZooKeeperURLManager)
      • ATLAS (AtlasZookeeperURLManager)

    • Could ZooKeeper support be added for any services which do no currently support it?
      • These have zookeeper-related configuration (see table):
        • WEBHDFS
        • OOZIE
        • YARN
        • WEBHCAT
        • RESOURCEMANAGER


  • For the ZooKeeper-based-HA services, determine if the ZooKeeper details are available from the service's configuration via Ambari.

  • Can "HA mode" be determined for every service type from the cluster configuration details? Can Knox dynamically identify HA-configured services, and generate the topology accordingly?

  • Determine how to leverage the cluster discovery data to generate the ZooKeeper HA configuration for the relevant declared topology services.

...

Code Block
languagejs
themeConfluence
titleProposed Simple Descriptor Service Params - YAML
---
discovery-address: http://localhost:8080
discovery-user: maria_dev
provider-config-ref: sandbox-providers
cluster: Sandbox

services:
    - name: NAMENODE
    - name: JOBTRACKER
    - name: WEBHDFS
      params:
        maxFailoverAttempts: 5
        maxRetryAttempts: 5
        failoverSleep: 1001
    - name: WEBHBASE
    - name: HIVE
      params:
        haEnabled: true
        maxFailoverAttempts: 5
        maxRetryAttempts: 5
        failoverSleep: 1001
        zookeeperNamespace: hiveserver2                                           # Optionally, omit this, and Knox could discover it
        zookeeperEnsemble: http://host1:2181,http://host2:2181,http://host3:2181  # Optionally, omit this, and Knox could discover it
    - name: RESOURCEMANAGER
 

...

Code Block
languagejs
themeConfluence
titleProposed Simple Descriptor Service Params - JSON
{
  "discovery-address":"http://localhost:8080",
  "discovery-user":"maria_dev",
  "provider-config-ref":"sandbox-providers",
  "cluster":"Sandbox",
  "services":[
    {"name":"NAMENODE"},
    {"name":"JOBTRACKER"},
    {"name":"WEBHDFS",
       "params": {
        "maxFailoverAttempts": "5",
        "maxRetryAttempts": "5",
        "failoverSleep": "1001"
       }
    },
    {"name":"WEBHBASE"},
    {"name":"HIVE",
       "params": {
        "haEnabled": "true",
        "maxFailoverAttempts": "4",
        "maxRetryAttempts": "6",
        "failoverSleep": "5000",
        "zookeeperNamespace": "hiveserver2",
        "zookeeperEnsemble": "http://host1:2181,http://host2:2181,http://host3:2181"
       }
    },
    {"name":"RESOURCEMANAGER"}
  ]
}
Code Block
languagexml
themeConfluence
titleProposed Generated Topology XML
...
  <provider>
    <role>ha</role>
    <name>HaProvider</name>
    <enabled>true</enabled>
    <param>
      <name>WEBHDFS</name>
      <!-- No cluster-specific details here -->
      <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
    </param>
    <param>
      <name>HIVE</name>
      <!-- No cluster-specific details here -->
      <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
    </param>
  </provider>
</gateway>

<service>
  <role>WEBHDFS</role>
  <url>http://host1:50070/webhdfs</url>
  <url>http://host2:50070/webhdfs</url>
</service>

<service>
  <role>HIVE</role>
  <!-- Cluster-specific details here -->
  <param>
    <name>zookeeperEnsemble</name>
    <value>host1:2181,host2:2181,host3:2181</value>
  </param>
  <param>
    <name>zookeeperNamespace</name>
    <value>hiveserver2</value>
  </param>
  <param>
    <name>haEnabled</name>
    <value>true</value>
  </param>



</service>
...