This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KNOX-88: Support HDFS HA
Skip to end of metadata
Go to start of metadata

Design details and discussion for KNOX-88 


Knox HA is a set of routines for transparent work with Hadoop service that stands in HA mode.

 Purpose of Knox HA service

  1. Automatic failover. (Example: switch request from not responding name-node to active name-node.)
  2. Pluggable support of failover strategies.
  3. Daemon-service for regular ping of Hadoop service state (Performance optimization to keeping actual state of service).



New provider will be added (descendant ProviderDeploymentContributorBase class) with a set of filters. See Pic.#1 for common architecture.

Pic. #1 – Providers architecture



Alias – set of Hadoop name-nodes configured for High Availability mode.



High Availability Strategy – plan of defining active name-node and switching between active and stand-by name-nodes. Strategy may contain such parameters as retryCount and timeoutInterval. See Pic.#2 for class diagram for HA mode.


Pic.#2 Class diagram for HA mode.



See Table #1 for class description.

Table #1. – HA mode new classes description.

#Class nameDescription
1HaUrlRewriteFunctionDescriptorDescribes function that resolves URLs in HA mode
2HaUrlRewriteFunctionProcessorImplements main logic of defining active or standby URL
3HaBaseStrategyHostMapperImplements base strategy for HA mode. Contains parameters: retryCount, timeoutInterval.


See Pic.#2 for  UML sequence diagram for UrlRewriteProcessor.

Pic #3 – UML sequence diagram for UrlRewriteProcessor.

Provider configuration example

Please look at the WebHDFS HA section 


  • No labels


    1. How would the param name webhdfs.ha be used by the HAProvider at deployment or runtime?
    2. How would two different services use totally different ha providers?  For the first iteration it is probably OK if that isn't possible but we should consider this use case as part of KNOX-177.
  1. Where are you planning on detecting the result of a given request in order to determine if that request has failed?

  2. How will you be managing the state of a set of service endpoints and how will that be kept thread safe?

  3. Minor point.  Should failover_strategy be failoverStrategy for consistency?