Goal

With the introduction of cluster detail discovery and topology generation in Apache Knox 0.14.0, it has become possible to make the configuration for proxying HA-enabled Hadoop services more dynamic/automatic.

Furthermore, it may even be possible for Knox to recognize the HA-enabled configuration for a service, and automatically configure a topology to interact with that service in an HA manner.

Knox could also leverage the Ambari cluster monitoring capability to respond to cluster changes, whereby service configurations are modified to enable HA, by regenerating and redeploying the affected topologies.

Current HA Configuration

Currently, topology services can accommodate proxying the corresponding HA-enabled cluster services using topology configuration or by leveraging ZooKeeper; the means depends on the cluster service.
For example, the WEBHDFS service is configured with URLs in the service declaration while the HIVE service employs the list of hosts in the configured ZooKeeper ensemble:

...
  <provider>
    <role>ha</role>
    <name>HaProvider</name>
    <enabled>true</enabled>
    <param>
      <name>WEBHDFS</name>
      <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
    </param>
    <param>
      <name>HIVE</name>
      <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=host1:2181,host2:2181,host3:2181;zookeeperNamespace=hiveserver2</value>
    </param>
  </provider>
</gateway>

<service>
  <role>WEBHDFS</role>
  <url>http://host1:50070/webhdfs</url>
  <url>http://host2:50070/webhdfs</url>
</service>
 
<service>
  <role>HIVE</role>
</service>
...

 

Topology-Based HA

Knox already handles the topology-based HA configuration to a degree. At topology generation time, if there are multiple hosts in the cluster for the associated service, Knox will add the URLs to the <service/> element.
If there is also an entry for that service in the HaProvider configuration, then the service will be proxied in an HA manner.

 

ZooKeeper-Based HA

Using Ambari, Knox can actually determine the ZooKeeper configuration for each service dynamically, relieving the administrator from having to explicitly configure this in each topology.

These are some examples of service configurations for which there is ZooKeeper-related information:

Service ConfigPropertyExample Value
HIVE:hive-sitehive.zookeeper.quorumhost1:2181,host2:2181,host3:2181
 hive.server2.zookeeper.namespacehiveserver2
 hive.server2.support.dynamic.service.discoverytrue
HBASE:hbase-sitehbase.zookeeper.quorumhost1,host2,host3
 hbase.zookeeper.property.clientPort2181
 zookeeper.znode.parent/hbase-unsecure
KAFKA:kafka-brokerzookeeper.connectsandbox.hortonworks.com:2181
HDFS:hdfs-siteha.zookeeper.quorumhost1:2181,host2:2181,host3:2181
 dfs.ha.automatic-failover.enabledtrue (only for "Auto HA")
OOZIE:oozie-siteoozie.zookeeper.connection.stringlocalhost:2181
 oozie.zookeeper.namespaceoozie
YARN:yarn-site
yarn.resourcemanager.ha.enabled
true
 
yarn.resourcemanager.zk-address
host1:2181,host2:2181,host3:2181
WEBHCAT:webhcat-sitetempleton.zookeeper.hostshost1:2181,host2:2181,host3:2181
ATLAS:application-propertiesatlas.kafka.zookeeper.connecthost1:2181,host2:2181,host3:2181
 atlas.server.ha.zookeeper.connecthost1:2181,host2:2181,host3:2181
 atlas.server.ha.zookeeper.zkroot/apache_atlas
 atlas.server.ha.enabledtrue

 

Required Changes

Open Items

 

Proposed Alternative Provider Configuration File Formats

JSON

{
  "providers": [
    {
      "role":"authentication",
      "name":"ShiroProvider",
      "enabled":"true",
      "params":{
        "sessionTimeout":"30",
        "main.ldapRealm":"org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm",
        "main.ldapContextFactory":"org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory",
        "main.ldapRealm.contextFactory":"$ldapContextFactory",
        "main.ldapRealm.userDnTemplate":"uid={0},ou=people,dc=hadoop,dc=apache,dc=org",
    	"main.ldapRealm.contextFactory.url":"ldap://localhost:33389",
	    "main.ldapRealm.contextFactory.authenticationMechanism":"simple",
	    "urls./**":"authcBasic"
	  }
    },
	{
	  "role":"hostmap",
	  "name":"static",
	  "enabled":"true",
	  "params":{
	    "localhost":"sandbox,sandbox.hortonworks.com"
	  }
	},
	{
	  "role":"ha",
	  "name":"HaProvider",
	  "enabled":"true",
	  "params":{
	    "WEBHDFS":"maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true",
		"HIVE":"maxFailoverAttempts=3;failoverSleep=1000;enabled=true"
	  }
	}
  ]
}


YAML 

---
providers:
  - role: authentication
    name: ShiroProvider
    enabled: true
    params:
      sessionTimeout: 30
      main.ldapRealm: org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm
      main.ldapContextFactory: org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory
      main.ldapRealm.contextFactory: $ldapContextFactory
      main.ldapRealm.userDnTemplate: uid={0},ou=people,dc=hadoop,dc=apache,dc=org
      main.ldapRealm.contextFactory.url: ldap://localhost:33389
      main.ldapRealm.contextFactory.authenticationMechanism: simple
      urls./**: authcBasic
  - role: hostmap
    name: static
    enabled: true
    params:
      localhost: sandbox,sandbox.hortonworks.com
  - role: ha
    name: HaProvider
    enabled: true
    params:           # No cluster-specific details here
      WEBHDFS: maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true
      HIVE: maxFailoverAttempts=3;failoverSleep=1000;enabled=true

---
discovery-address: http://localhost:8080
discovery-user: maria_dev
provider-config-ref: sandbox-providers
cluster: Sandbox

services:
    - name: NAMENODE
    - name: JOBTRACKER
    - name: WEBHDFS
      params:
        maxFailoverAttempts: 5
        maxRetryAttempts: 5
        failoverSleep: 1001
    - name: WEBHBASE
    - name: HIVE
      params:
        haEnabled: true
        maxFailoverAttempts: 5
        maxRetryAttempts: 5
        failoverSleep: 1001
        zookeeperNamespace: hiveserver2                                           # Optionally, omit this, and Knox could discover it
        zookeeperEnsemble: http://host1:2181,http://host2:2181,http://host3:2181  # Optionally, omit this, and Knox could discover it
    - name: RESOURCEMANAGER
 



{
  "discovery-address":"http://localhost:8080",
  "discovery-user":"maria_dev",
  "provider-config-ref":"sandbox-providers",
  "cluster":"Sandbox",
  "services":[
    {"name":"NAMENODE"},
    {"name":"JOBTRACKER"},
    {"name":"WEBHDFS",
       "params": {
        "maxFailoverAttempts": "5",
        "maxRetryAttempts": "5",
        "failoverSleep": "1001"
       }
    },
    {"name":"WEBHBASE"},
    {"name":"HIVE",
       "params": {
        "haEnabled": "true",
        "maxFailoverAttempts": "4",
        "maxRetryAttempts": "6",
        "failoverSleep": "5000",
        "zookeeperNamespace": "hiveserver2",
        "zookeeperEnsemble": "http://host1:2181,http://host2:2181,http://host3:2181"
       }
    },
    {"name":"RESOURCEMANAGER"}
  ]
}
...
  <provider>
    <role>ha</role>
    <name>HaProvider</name>
    <enabled>true</enabled>
    <param>
      <name>WEBHDFS</name>
      <!-- No cluster-specific details here -->
      <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
    </param>
    <param>
      <name>HIVE</name>
      <!-- No cluster-specific details here -->
      <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true</value>
    </param>
  </provider>
</gateway>

<service>
  <role>WEBHDFS</role>
  <url>http://host1:50070/webhdfs</url>
  <url>http://host2:50070/webhdfs</url>
</service>

<service>
  <role>HIVE</role>
  <!-- Cluster-specific details here -->
  <param>
    <name>zookeeperEnsemble</name>
    <value>host1:2181,host2:2181,host3:2181</value>
  </param>
  <param>
    <name>zookeeperNamespace</name>
    <value>hiveserver2</value>
  </param>
  <param>
    <name>haEnabled</name>
    <value>true</value>
  </param>



</service>
...