 Pseudo Federation Provider

This article will walk through the process of adding a new provider for establishing the identity of a user. The simple example of the Pseudo authentication mechanism in Hadoop will be used to communicate the general ideas for extending the preauthenticated federation provider that is available out of the box in Apache Knox. This is not a provider that should be used in a production environment and has at least one major limitation. It will however illustrate the general programming model for adding preauthenticated federation providers.

 Provider Types

Apache Knox has two types of providers for establishing the identity of the source of an incoming REST request. One is an Authentication Provider and the other is a Federation Provider.

 Authentication Providers

Authentication providers are responsible for actually collecting credentials of some sort from the end user. Examples of authentication providers would be things like HTTP BASIC authentication with username and password that gets authenticated against LDAP or RDBMS, etc. Apache Knox ships with HTTP BASIC authentication against LDAP using Apache Shiro. The Shiro provider can actually be configured in multiple ways.

Authentication providers are sometimes less than ideal since many organizations only want their users to provide credentials to the enterprise trusted/preferred solutions and to use some sort of SSO or federation of that authentication event across all other applications.

 Federation Providers

Federation providers, on the other hand, never see the users' actual credentials but instead federate a previous authentication event through the processing and validation of some sort of token. This allows for greater isolation and protection of user credentials while still providing a means to verify the trustworthiness of the incoming identity assertions. Examples of federation providers would be things like OAuth 2, SAML Assertions, JWT/SWT tokens, Header based identity propagation, etc. Out of the box, Apache Knox enables the use of custom headers for propagating things like the user principal and group membership through the HeaderPreAuth federation provider.

This is generally useful, for solutions such as CA SiteMinder and IBM Tivoli Access Manager. In these sorts of deployments, all traffic to Hadoop would have to go through the solution provider's gateway which authenticates the user and can inject identity propagation headers into the request. The fact that the network security does not allow for requests to bypass the solution gateway provides a level of trust for accepting the header based identity assertions. We also provide for additional validation through a pluggable mechanism and have an ip address validation that can be used out of the box.

 Let's add a Federation Provider

This article will discuss what is involved in adding a new federation provider that will actually extend the abstract bases that were introduced in the PreAuth provider module. It will be a very minimal provider that accepts a request parameter from the incoming request as the user's principal.

 The module and dependencies

The Apache Knox project uses Apache Maven for build and dependency management. We will need to create a new module for the Pseudo federation provider and include our own pom.xml.

<?xml version="1.0" encoding="UTF-8"?> 4.0.0









NOTE: the "version" element must match the version indicated in the pom.xml of the Knox project. Otherwise, building will fail.


This particular federation provider is going to extend the existing PreAuth module with the capability to accept the request parameter as an assertion of the identity by a trusted party. Therefore, we will depend on the preauth module in order to leverage the facilities available in the base classes available there for things like ip address validation, etc.


The gateway-spi dependency above pulls in the general interfaces, base classes and utilities that are expected for extended the Apache Knox gateway. The core GatewayServices are available through the gateway-spi module as well as a number of other foundational elements of gateway development.


This gateway-util-common module, as the name suggests, provides common utility facilities for the developing the gateway product. This is where you find the auditing, JSON and url utilities classes for gateway development.

 javax.servlet from org.eclipse.jetty.orbit

This module provides the servlet filter specific classes that are need for the provider filter implementation.

 junit, easymock and gateway-test-utils

JUnit, easymock and gateway-test-utils provide the basis for writing REST based unit tests for the Apache Knox Gateway project and can be found in all of the existing unit tests for the various modules that make up the gateway offering.

 Apache Knox Topologies

In Apache Knox, individual Hadoop clusters are represented by descriptors called topologies that result in the deployment of specific endpoints that expose and protect access to the services of the associated Hadoop cluster. The topology descriptor describes the available services and their respective URL's within the actual Hadoop cluster as well as the policy for protecting access to those services. The policy is defined through the description of various Providers. Each provider and service within a Knox topology has a role and provider roles consist of: authentication, federation, authorization, identity assertion, etc. In this article we are concerned with a provider of type federation.

Since the Pseudo provider is assuming that authentication has happened at the OS level or from within another piece of middleware and that credentials were exchanged with some party other than Knox, we will be making this a federation provider. The typical provider configuration will look something like this:


Ultimately, an Apache Knox topology manifests as a web application deployed within the gateway process that exposes and protects the URLs associated with the services of the underlying Hadoop components in each cluster. Providers generally interject a ServletFilter into the processing path of the REST API requests that enter the gateway and are dispatched to the Hadoop cluster. The mechanism used to interject the filters, their related configuration and integration into the gateway is the ProviderDeploymentContributor.


package org.apache.hadoop.gateway.preauth.deploy;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.hadoop.gateway.deploy.DeploymentContext;
import org.apache.hadoop.gateway.deploy.ProviderDeploymentContributorBase;
import org.apache.hadoop.gateway.descriptor.FilterParamDescriptor;
import org.apache.hadoop.gateway.descriptor.ResourceDescriptor;
import org.apache.hadoop.gateway.topology.Provider;
import org.apache.hadoop.gateway.topology.Service;

public class PseudoAuthContributor extends
    ProviderDeploymentContributorBase {
  private static final String ROLE = "federation";
  private static final String NAME = "Pseudo";
  private static final String PREAUTH_FILTER_CLASSNAME = "org.apache.hadoop.gateway.preauth.filter.PseudoAuthFederationFilter";

  public String getRole() {
    return ROLE;

  public String getName() {
    return NAME;

  public void contributeFilter(DeploymentContext context, Provider provider, Service service, 
      ResourceDescriptor resource, List<FilterParamDescriptor> params) {
    // blindly add all the provider params as filter init params
    if (params == null) {
      params = new ArrayList<FilterParamDescriptor>();
    Map<String, String> providerParams = provider.getParams();
    for(Entry<String, String> entry : providerParams.entrySet()) {
      params.add( resource.createFilterParam().name( entry.getKey().toLowerCase() ).value( entry.getValue() ) );
    resource.addFilter().name( getName() ).role( getRole() ).impl( PREAUTH_FILTER_CLASSNAME ).params( params );

The way in which the required DeploymentContributors for a given topology are located is based on the use of the role and the name of the provider as indicated within the topology descriptor. The topology deployment machinery within Knox first looks up the requried DeploymentContributor by role. In this case, it identifies the identity provider as being a type of federation. It then looks for the federation provider with the name of Pseudo.

Once the providers have been resolved into the required set of DeploymentContributors each contributor is given the opportunity to contribute to the construction of the topology web application that exposes and protects the service APIs within the Hadoop cluster.

This particular DeploymentContributor needs to add the PseudoAuthFederationFilter servlet Filter implementation to the topology specific filter chain. In addition to adding the filter to the chain, this provider will also add each of the provider params from the topology descriptor as filterConfig parameters. This enables the configuration of the resulting servlet filters from within the topology descriptor while enacapsulating the specific implementation details of the provider from the end user.


package org.apache.hadoop.gateway.preauth.filter;

import java.util.Set;

import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;

public class PseudoAuthFederationFilter 
  extends AbstractPreAuthFederationFilter {

  public void init(FilterConfig filterConfig) throws ServletException {

   * @param httpRequest
  protected String getPrimaryPrincipal(HttpServletRequest httpRequest) {
    return httpRequest.getParameter("");

   * @param principals
  protected void addGroupPrincipals(HttpServletRequest request, 
      Set<Principal> principals) {
    // pseudo auth currently has no assertion of group membership

The PseudoAuthFederationFilter above extends AbstractPreAuthFederationFilter. This particular base class takes care of a number of boilerplate type aspects of preauthenticated providers that would otherwise have to be done redundantly across providers. The two abstract methods that are specific to each provider are getPrimaryPrincipal and addGroupPrincipals. These methods are called by the base class in order to determine what principals should be created and added to the java Subject that will become the effective user identity for the request processing of the incoming request.


Implementing the abstract method getPrimaryPrincipal allows the new provider to extract the established identity from the incoming request or however appropriate for the given provider and communicate it back the the AbstractPreAuthFederationFilter which will in turn add it to the java Subject being created to represent the user's identity. For this particular provider, all we have to do is return the request parameter by the name of "".


Given a set of Principals, the addGroupPrincipals is an opportunity to add additional group principals to the resulting java Subject that will be used to represent the user's identity. This is specifically done by adding new to the set. For the Pseudo authentication mechanism in Hadoop, there really is no way to communicate the group membership through the request parameters. One could easily envision adding an additional request parameter for this though - something like "user.groups".

 Configure as an Available Provider

In order for the deployment machinery to be able to discover the availability of your new provider implementation, you will need to make sure that the org.apache.hadoop.gateway.deploy.ProviderDeploymentContributor file is in the resources/META-INF/services directory and that it contains the classname of the new provider's DeploymentContributor - in this case PseudoAuthContributor.


 Add to Knox as a Gateway Module

At this point, the module should be able to be built as a standalone module with:

mvn clean install

However, we want to extend the Apache Knox Gateway build to include the new module in its build and release processes. In order to do this we will need to add it to a common pom.xml files.

 Root Level Pom.xml

At the root of the project source tree there is a pom.xml file that defines all of the modules that are official components of the gateway server release. You can find each of these modules in the "modules" element. We need to add our new module declaration there:


Then later in the same file we have to add a fuller definition of our module to the dependencyManagement/dependencies element:


 Gateway Release Module Pom.xml

Now, our Pseudo federation provider is building with the gateway project but it isn't quite included in the gateway server release artifacts. In order to get it included in the release archives and available to the runtime, we need to add it as a dependency to the appropriate release module. In this case, we are adding it to the pom.xml file within the gateway-release module:


Note that this is basically the same definition that was added to the root level pom.xml minus the "version" element.

 Build, Test and Deploy

At this point, we should have an integrated custom component that can be described for use within the Apache Knox topology descriptor file and engaged in the authentication of incoming requests for resources of the protected Hadoop cluster.


You may use the same maven commands to:

mvn clean install

This will build and run the gateway unit tests.

You may also use the following to not only build and run the tests but to also package up the release artifacts. This is a great way to quickly setup a test instance in order to manually test your new Knox functionality.

ant package


To install the newly packaged release archive in a GATEWAY_HOME environment:

ant install-test-home

This will unzip the release bits into a local ./install directory and do some initial setup tasks to ensure that it is actually runnable.

We can now start a test ldap server that is seeded with a couple test users:

ant start-test-ldap

The sample topology files are setup to authenticate against this LDAP server for convenience and can be used as is in order to quickly do a sanity test of the install.

At this point, we can choose to run a test Knox instance or a debug Knox instance. If you want to run a test instance without the ability to connect a debugger then:

ant start-test-gateway

If you would like to connect a debugger and step through the code to debug or ensure that your functionality is running as expected then you need a debug instance:

ant start-debug-gateway


You may now test the out of the box authentication against LDAP using HTTP BASIC by using curl and one of the simpler APIs exposed by Apache Knox:

curl -ivk --user guest:guest-password "https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp?op=LISTSTATUS"

 Change Topology Descriptor

Once the server is up and running and you are able to authenticate with HTTP BASIC against the test LDAP server, you can now change the topology descriptor to leverage your new federation provider.

Find the sandbox.xml file in the install/conf/topologies file and edit it to reflect your provider type, name and any provider specific parameters.


Once your federation provider is configured, just save the topology descriptor. Apache Knox will notice that the file has changed and automatically redeploy that particular topology. Any provider params described in the provider element will be added to the PseudoAuthFederationFilter as servlet filter init params and can be used to configure aspects of the filter's behavior.

 curl again

We are now ready to use curl again to test the new federation provider and ensure that it is working as expected:

curl -ivk "https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp?op=LISTSTATUS&"

 More Resources

Apache Knox Developers Guide:

Apache Knox Users Guide:

Github project for this article:


This article has illustrated a simplified example of implementing a federation provider for establishing the identity of a previous authentication event and propagating that into the request processing for Hadoop REST APIs inside of Apache Knox.

The process to extend the preauthenticated federation provider is a quick and simple way to extend certain SSO capabilities into providing authenticated access to Hadoop resources through Apache Knox.

The Knox community is a growing community that welcomes contributions from interested users in order to grow the capabilities to include truly useful and impactful features.

NOTE: It is important to understand that the provider illustrated in this example has limitations that preclude it from being used in production. Most notably, it does not have any means to follow redirects due to the missing parameter in the Location header. In order to do this, we would need to set a cookie to determine the user identity on the redirected request.

This article covers adding a service to Apache Knox.

The idea here is to provide an intentionally simple example, avoiding complexity wherever possible. The goal being getting something working as a starting point upon which more complicated scenarios could build. You may want to review the Apache Knox User’s Guide and Developer’s Guide before reading this.

The API used here is an OpenWeatherMap API that returns the current weather information for a given zip code. This is the cURL command to access this API directly. Give it a try.

If you are new to Knox you may also want to check out ’Setting up Apache Knox in three easy steps’.

curl ',us&appid=2de143494c0b295cca9337e1e96b00e0'

This should return a JSON similar to the output shown below. Your results probably won’t be nicely formatted. Note that I’m not giving anything away here with the appid. This is what they use in all of their examples.

   "weather":[{"id":800,"main":"Clear","description":"Sky is Clear","icon":"01d"}],
   "base":"cmc stations",

This is the cURL command showing how we will expose that service via the gateway. Don’t try this now, it won’t work until later!

curl -ku guest:guest-password 'https://localhost:8443/gateway/sandbox/weather/data/2.5/weather?zip=95054,us&appid=2de143494c0b295cca9337e1e96b00e0'

So the fundamental job of the gateway is to translate the effective request URL it receives to the target URL and then transfer the request and response bodies. In this example we will ignore the request and response bodies and focus on the request URL. Lets take a look at how these two request URLs are related.

We can start by breaking down the Gateway URL and understanding where each of the URL parts come from.

httpsThe gateway has SSL/TLS enabled: See ssl.enabled in gateway-site.xml
localhostThe gateway is listening on See in gateway-site.xml
8443The gateway is listening on port 8443: See gateway.port in gateway-site.xml
gatewayThe gateway context path is ‘gateway’: See gateway.path in gateway-site.xml
sandboxThe topology file that includes the WEATHER service is named sandbox.xml
weatherThe unique root of all WEATHER service URLs. Identified in service’s service.xml
data/2.5/weatherThis portion of the URL is handled by the service’s rewrite.xml rules

In contrast we really only care about two parts of the service’s Direct URL.

http://api.openweathermap.orgThe network address of the service itself.
data/2.5/weatherThe path for the weather API of the service.


Now we need to get down to the business of actually making the gateway proxy this service. To do that we will be using the new configuration based extension model introduced in Knox 0.6.0. That will involve adding two new files under the <GATEWAY_HOME>/data/services directory and then modifying a topology file.

Note: The <GATEWAY_HOME> here represents the directory where Apache Knox is installed.

First you need to create a directory to hold your new service definition files. There are two conventions at work here that ultimately (but only loosely) relate to the content of the service.xml it will contain. Below the <GATEWAY_HOME>/data/services directory you will need to create a parent and child directory weather/0.0.1. As a convention the names of these directories duplicate the values in the attributes of the root element of the contained service.xml.

Create the two files with the content shown below and place them in the directories indicated. The links also provide the files for your convenience.


<service role="WEATHER" name="weather" version="0.0.1">
    <route path="/weather/**"/>


  <rule dir="IN" name="WEATHER/weather/inbound" pattern="*://*:*/**/weather/{path=**}?{**}">
    <rewrite template="{$serviceUrl[WEATHER]}/{path=**}?{**}"/>

Once that is complete, the topology file must be updated to activate this new service in the runtime. In this case the sandbox.xml topology file is used but you may have another topology file such as default.xml. Edit which ever topology file you prefer and add the… markup shown below. If you aren’t using sandbox.xml be careful to replace sandbox with the name of your topology file through these examples.



With all of these changes made you must restart your Knox gateway server. Often times this isn’t necessary but adding a new service definition under [<GATEWAY_HOME>/data/services requires restart.

You should now be able to execute the curl command from way back at the top that accesses the OpenWeatherMap API via the gateway.

curl -ku guest:guest-password 'https://localhost:8443/gateway/sandbox/weather/data/2.5/weather?zip=95054,us&appid=2de143494c0b295cca9337e1e96b00e0'

Now that the new service definition is working lets go back and connect all the dots. This should help take some of the mystery out of the configuration above. The most important and confusing aspect is how values in different files are interrelated. I will focus on that.


The service.xml file defines the high level URL patterns that will be exposed by the gateway for a service. If you are getting HTTP 404 errors there is probably a problem with this configuration.

<service role="WEATHER"

  • The role/implementation/version triad is used through Knox for integration plugins.
  • Think of the role as an interface in Java.
  • This attribute declares what role this service “implements”.
  • This will need to match the topology file’s <topology><service><role> for this service.

<service name="weather"

  • In the role/implementation/version triad this is the implementation.
  • Think of this as a Java implementation class name relative to an interface.
  • As a matter of convention this should match the directory beneath <GATEWAY_HOME>/data/services
  • The topology file can optionally contain <topology><service><name> but usually doesn’t. This would be used to select a specific implementation of a role if there were multiple.

<service version="0.0.1"

  • As a matter of convention this should match the directory beneath the service implementation name.
  • The topology file can optionally contain <topology><service><version> but usually doesn’t. This would be used to select a specific version of an implementation there were multiple. This can be important if the protocols for a service evolve over time.

<service><routes><route path="/weather/**"

  • This tells the gateway that all requests starting starting with /weather/ are handled by this service.
  • Due to a limitation this will not include requests to /weather (i.e. no trailing /)
  • The ** means zero or more paths similar to Ant.
  • The scheme, host, port, gateway and topology components are not included (e.g. https://localhost:8443/gateway/sandbox)
  • Routes can, but typically don’t, take query parameters into account.
  • In this simple form there is no direct relationship between the route path and the rewrite rules!


The rewrite.xml is configuration that drives the rewrite provider within Knox. It is important to understand that at runtime for a given topology, all of the rewrite.xml files for all active services are combined into a single file. This explains some of the seemingly complex patterns and naming conventions.

<rules><rule dir="IN"

  • Here dir means direction and IN means it should apply to a request.
  • This rule is a global rule meaning that any other service can request that a URL be rewritten as they process URLs. The rewrite provider keeps distinct trees of URL patterns for IN and OUT rules so that services can be specific about which to apply.
  • If it were not global it would not have a direction and probably not a pattern in the element.

<rules><rule name="WEATHER/weather/inbound"

  • Rules can be explicitly invoked in various ways. In order to allow that they are named.
  • The convention is role/name/<service specific hierarchy>.
  • Remember that all rules share a single namespace.

<rules><rule pattern="*://*:*/**/weather/{path=**}?{**}"

  • Defines the URL pattern for which this rule will apply.
  • The * matches exactly one segment of the URL.
  • The ** matches zero or more segments of the URL.
  • The {path=**} matches zero or more path segments and provides access them as a parameter named 'path’.
  • The {**} matches zero or more query parameters and provides access to them by name.
  • The values from matched {…} segments are “consumed” by the rewrite template below.

<rules><rule><rewrite template="{$serviceUrl[WEATHER]}/{path=**}?{**}"

  • Defines how the URL matched by the rule will be rewritten.
  • The $serviceUrl[WEATHER]} looks up the <service><url> for the <service><role>WEATHER. This is a implemented as rewrite function and is another custom extension point.
  • The {path=**} extracts zero or more values for the 'path’ parameter from the matched URL.
  • The {**} extracts any “unused” parameters and uses them as query parameters.






  • <role> causes the service definition with role WEATHER to be loaded into the runtime.
  • Since <name> and <version> are not present, a default is selected if there are multiple options.
  • <url> populates the data used by {$serviceUrl[WEATHER]} in the rules with the correct target URL.

Hopefully all of this provides a more gentle introduction to adding a service to Apache Knox than might be offered in the Apache Knox Developer’s Guide. If you have more questions, comments or suggestions please join the Apache Knox community. In particular you might be interested in one of the mailing lists.

This article covers using Apache Knox with ActiveDirectory.

Currently Apache Knox comes “out of the box” setup with a demo LDAP server based on ApacheDS. This was a conscious decision made to simplify the initial user experience with Knox. Unfortunately, it can make the transition to popular enterprise identity stores such as ActiveDirectory confusing. This article is intended to remedy some of that confusion.

If you are new to Knox you may want to check out ’Setting up Apache Knox in three easy steps’.

Part 1

Lets go back to basics and build up an example from first principles. To do this we will start with the simplest topology file that will work. We will iteratively transform that topology file until it integrates with ActiveDirectory for both authentication and authorization.

Sample 1

The initial topology file we will start with doesn’t integrate with ActiveDirectory at all. Instead it uses a capability of Shiro to embed users directly within its configuration. This approach is largely taken to “shake out” the process of editing topology files for various purposes. At the same time it minimizes external dependencies to help ensure a successful starting point. Now, create this topology file.



      <param name="users.admin" value="admin-secret"/>
      <param name="urls./**" value="authcBasic"/>


If you are a seasoned Knox veteran, you may notice the alternative <param name=“” value=“”/> style syntax. Both this and <param><name></name><value></value></param> style are supported. I’ve used the attribute style here for compactness.

Once this topology file is created you will be able to access the Knox Admin API, which is what the KNOX service in the topology file provides. The cURL command shown below retrieves the version information from the Knox server. Notice -u admin:admin-secretin the command below matches <param name="users.admin" value="admin-secret"/> in the topology file above.

curl -u admin:admin-secret -ik 'https://localhost:8443/gateway/sample1/api/v1/version'

Below is an example response body output from the command above.
Note: The -i causes the return of the full response including status line and headers which aren’t shown below for brevity.

<?xml version="1.0" encoding="UTF-8"?>

As an aside, if you prefer JSON you can request that using the HTTP Accept header via the cURL -H flag.
Don’t forget to scroll right in these code boxes as some of these commands will start to get long.

curl -u admin:admin-secret -H 'Accept: application/json' -ik 'https://localhost:8443/gateway/sample/api/v1/version'

Below is an example response JSON body for this command.

   "ServerVersion" : {
      "version" : "0.7.0-SNAPSHOT",
      "hash" : "9632b697060bfeffa2e03425451a3e9b3980c45e"

Sample 2

With authentication working, now add authorization since the real goal is an example with ActiveDirectory including both. The second sample topology file below adds a second user (guest) and an authorization provider. The <param name="knox.acl" value="admin;*;*"/> dictates that only the admin user can access the knox service. Go ahead and create this topology file. Notice the examples use a different name for each topology file so you can always refer back to the previous ones.



      <param name="users.admin" value="admin-secret"/>
      <param name="users.guest" value="guest-secret"/>
      <param name="urls./**" value="authcBasic"/>

      <param name="knox.acl" value="admin;*;*"/>


Once this is created, test it with the cURL commands below and see that the admin user can access the API but the guest user can’t.

curl -u admin:admin-secret -ik 'https://localhost:8443/gateway/sample2/api/v1/version'
curl -u guest:guest-secret -ik 'https://localhost:8443/gateway/sample2/api/v1/version'

The first command will succeed. The second command above will return a HTTP/1.1 403 Forbidden status along with an error response body.

Part 2

These embedded examples are all well and good but this article is supposed to be about ActiveDirectory. This takes us from examples that “just work” to examples that need to be customized for the environment in which they run. Specifically they require some basic network address and and bunch of LDAP information. The table below describes the initial information you will need from your environment and shows what is being used in the samples here. You will need to adjust values these when used in the samples to match your environment.

A word of caution is warranted here. There are as many ways to setup LDAP and ActiveDirectory as there are IT departments. This variability requires flexibility which in turn often causes confusion, especially given poor documentation (guilty). The examples here focus on a single specific pattern that is seen frequently, but your mileage may vary.

Server HostThe hostname where ActiveDirectory is
Server PortThe port on which ActiveDirectory is listening.389
System UsernameThe distinguished name for a user with search permissions.CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com
System PasswordThe password for the system user. (See: Note1)********
Search BaseThe subset of users to search for authentication. (See: Note2)CN=Users,DC=hwqe,DC=hortonworks,DC=com
Search AttributeThe attribute containing the username to search for authentication.sAMAccountName
Search ClassThe object class for LDAP entities to search for authentication.person

Note1: In these samples the password will be embedded with the topology files. This is for simplicity. The password can be stored in a protected credential store as described here.
Note2: This search base should constrain the search as much as possible to limit the amount of data returned by the query.

To start things off on the right foot, lets execute an LDAP bind against ActiveDirectory. For this you will need your values for Server Host, Server Port, System Username and System Password described in the table above. This initial testing will be done using command line tools from OpenLDAP. If you don’t have these command line tools available don’t despair Knox provides some alternatives I’ll show you later.

The command below will help ensure that the values for Server Host, Server Port, System Username and System Password are correct. In this case I’m using my own test account as the system user because it happens to have search privileges.

ldapwhoami -h -p 389 -x -D 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com' -w '********'

This is brief description of each command line parameter used above.

  • -h: Provide your Server Host
  • -p: Provide your Server Port
  • -x: Use simple authentication vs SASL
  • -D: Provide your System Username
  • -w: Provide your System Password

For me this command returns the output below.


Now lets make sure that the system user can actually search. Note that in this case the system user is searching for itself because -D and -b use the same value. You could change -b to search for other users.

ldapsearch -h -p 389 -x -D 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com' -w '********' -b 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com'
  • -b: Provide your System Username

This returns all of the LDAP attributes for the system user. Take note of a few key attributes like objectClass, which here is ‘person’, and sAMAccountName, which here is 'kminder’.

# extended LDIF
# LDAPv3
# base <CN=Users,DC=hwqe,DC=hortonworks,DC=com> with scope subtree
# filter: CN=Kevin Minder
# requesting: ALL

# Kevin Minder, Users,
dn: CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
cn: Kevin Minder
sn: Minder
givenName: Kevin
distinguishedName: CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com
instanceType: 4
whenCreated: 20151117175833.0Z
whenChanged: 20151117175919.0Z
displayName: Kevin Minder
uSNCreated: 26688014
uSNChanged: 26688531
name: Kevin Minder
objectGUID:: Eedvw9dqoUK/ERLNEFrQ5w==
userAccountControl: 66048
badPwdCount: 0
codePage: 0
countryCode: 0
badPasswordTime: 130922583862610479
lastLogoff: 0
lastLogon: 130922584014955481
pwdLastSet: 130922567133848037
primaryGroupID: 513
objectSid:: AQUAAAAAAAUVAAAA7TkHmDQ43l1xd4O/MigBAA==
accountExpires: 9223372036854775807
logonCount: 0
sAMAccountName: kminder
sAMAccountType: 805306368
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=hwqe,DC=hortonworks,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 130922567243691894

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

Next, lets check the values for Search Base, Search Attribute and Search Class with a command like the one below.
Again, don’t forget to scroll right to see the whole command.

ldapsearch -h -p 389 -x -D 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com' -w '********' -b 'CN=Users,DC=hwqe,DC=hortonworks,DC=com' -z 5 '(objectClass=person)' sAMAccountName
  • -z 5: Limit the search results to 5 entries. Note that by default AD will only return a max of 1000 entries.
  • ’(objectClass=person)’: Limit the search results to entries where objectClass=person. This value was taken from the search result above.
  • sAMAccountName: Return only the SAMAccountName attribute

If no results were returned go back and check the output from the search above for the correct settings. The results for this command should look something like what is shown below. Take note of the various attribute values returned for sAMAccountName. These are the usernames that will ultimately be used for login.

# extended LDIF
# LDAPv3
# base <CN=Users,DC=hwqe,DC=hortonworks,DC=com> with scope subtree
# filter: (objectClass=person)
# requesting: sAMAccountName

# Administrator, Users,
dn: CN=Administrator,CN=Users,DC=hwqe,DC=hortonworks,DC=com
sAMAccountName: Administrator

# guest, Users,
dn: CN=guest,CN=Users,DC=hwqe,DC=hortonworks,DC=com
sAMAccountName: guest

# cloudbase-init, Users,
dn: CN=cloudbase-init,CN=Users,DC=hwqe,DC=hortonworks,DC=com
sAMAccountName: cloudbase-init

# krbtgt, Users,
dn: CN=krbtgt,CN=Users,DC=hwqe,DC=hortonworks,DC=com
sAMAccountName: krbtgt

# ambari-server, Users,
dn: CN=ambari-server,CN=Users,DC=hwqe,DC=hortonworks,DC=com
sAMAccountName: ambari-server

# search result
search: 2
result: 4 Size limit exceeded

# numResponses: 6
# numEntries: 5

Sample 3

At this point you have verified all of the environmental information required for authentication, you are ready to create your third topology file. Just as with the first example, this topology file will only include authentication. We will tackle authorization later.

The table below highlights the the important settings in the topology file.

main.ldapRealmThe class name for Knox’s Shiro Realm
main.ldapContextFactoryThe class name for Knox’s Shiro LdapContextFactory
main.ldapRealm.contextFactorySets the context factory on the realm.$ldapContextFactory
main.ldapRealm.contextFactory.urlSets the AD URL on the context factory.ldap://
main.ldapRealm.contextFactory.systemUsernameSets the system users DN on the context factory.CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com
main.ldapRealm.contextFactory.systemPasswordSets the system users password on the context factory.********
main.ldapRealm.searchBaseThe subset of users to search for authentication.CN=Users,DC=hwqe,DC=hortonworks,DC=com
main.ldapRealm.userSearchAttributeNameThe attribute who’s value to use for username comparison.sAMAccountName
main.ldapRealm.userObjectClassThe objectClass to limit the search scope.person
urls./**Apply authentication to all URLs.authcBasic


Create this sample3 topology file. Take care to replace all of the example environment values with the correct values for your environment you discovered and verified above.



      <param name="main.ldapRealm" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm"/>
      <param name="main.ldapContextFactory" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory"/>
      <param name="main.ldapRealm.contextFactory" value="$ldapContextFactory"/>

      <param name="main.ldapRealm.contextFactory.url" value="ldap://"/>
      <param name="main.ldapRealm.contextFactory.systemUsername" value="CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
      <param name="main.ldapRealm.contextFactory.systemPassword" value="********"/>

      <param name="main.ldapRealm.searchBase" value="CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
      <param name="main.ldapRealm.userSearchAttributeName" value="sAMAccountName"/>
      <param name="main.ldapRealm.userObjectClass" value="person"/>

      <param name="urls./**" value="authcBasic"/>


We could go straight to trying to access the Knox Admin API with cURL as we did before. However, lets take this opportunity to explore the new LDAP diagnostic tools introduced in Apache Knox 0.7.0.

This first command helps diagnose basic connectivity and system user issues.

bin/ system-user-auth-test --cluster sample3
System LDAP Bind successful.

If the command above works you can move on to testing the LDAP search configuration settings of the topology. If you don’t provide the username and password via the command line switches you will be prompted to enter them.

bin/ user-auth-test --cluster sample3 --u kminder --p '********'
LDAP authentication successful!

Once all of that is working go ahead and try the cURL command.

curl -u kminder:******** -ik 'https://localhost:8443/gateway/sample3/api/v1/version'

Sample 4

The next step is to enable authorization. To accomplish this there is a bit more environmental information needed. The OpenLDAP command line tools are useful here again to ensure that we have the correct values. Authorization requires determining group membership. We will be using searching to determine group membership. The way ActiveDirectory is setup for this example, this requires knowing four additional pieces of information: groupSearchBase, groupObjectClass, groupIdAttribute and memberAttribute.

The first, 'groupSearchBase’ is something that you will need to find out from your ActiveDirectory administrator. In my example, this is value 'OU=groups,DC=hwqe,DC=hortonworks,DC=com’. This value is a distinguished name that constrains the search groups for which a given user might be a member. Once you have this you can 'ldapsearch’ to see the attributes of some groups to determine the other three settings.

Here is an example of an 'ldapsearch’ using groupSearchBase from my environment.

ldapsearch -h -p 389 -x -D 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com' -w '********' -b 'OU=groups,DC=hwqe,DC=hortonworks,DC=com' -z 2

This is the output.

# extended LDIF
# LDAPv3
# base <OU=groups,DC=hwqe,DC=hortonworks,DC=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL

# groups,
dn: OU=groups,DC=hwqe,DC=hortonworks,DC=com
objectClass: top
objectClass: organizationalUnit
ou: groups
distinguishedName: OU=groups,DC=hwqe,DC=hortonworks,DC=com
instanceType: 4
whenCreated: 20150812202242.0Z
whenChanged: 20150812202242.0Z
uSNCreated: 42340
uSNChanged: 42341
name: groups
objectGUID:: RYIcbNyVWki5HmeANfzAbA==
objectCategory: CN=Organizational-Unit,CN=Schema,CN=Configuration,DC=hwqe,DC=h
dSCorePropagationData: 20150827225949.0Z
dSCorePropagationData: 20150812202242.0Z
dSCorePropagationData: 16010101000001.0Z

# scientist, groups,
dn: CN=scientist,OU=groups,DC=hwqe,DC=hortonworks,DC=com
objectClass: top
objectClass: group
cn: scientist
member: CN=sam repl2,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=bob,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam,CN=Users,DC=hwqe,DC=hortonworks,DC=com
distinguishedName: CN=scientist,OU=groups,DC=hwqe,DC=hortonworks,DC=com
instanceType: 4
whenCreated: 20150812213414.0Z
whenChanged: 20150828231624.0Z
uSNCreated: 42355
uSNChanged: 751045
name: scientist
objectGUID:: iXhbVo7kJUGkiQ+Sjlm0Qw==
sAMAccountName: scientist
sAMAccountType: 536870912
groupType: -2147483644
objectCategory: CN=Group,CN=Schema,CN=Configuration,DC=hwqe,DC=hortonworks,DC=
dSCorePropagationData: 20150827225949.0Z
dSCorePropagationData: 16010101000001.0Z

# search result
search: 2
result: 4 Size limit exceeded

# numResponses: 3
# numEntries: 2

From the output, take note of:

  • the relevant objectClass: 'group’
  • attribute used to enumerate members: 'member’
  • the attributes that most uniquely name the group: 'cn’ or 'sAMAccountName’

These are the groupObjectClass and memberAttribute, values respectively. We will use groupObjectClass=group, memberAttribute=member and groupIdAttribute=sAMAccountName.

The command below repeats the search above but returns just the member attribute for up to 5 groups.

ldapsearch -h -p 389 -x -D 'CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com' -w '********' -b 'OU=groups,DC=hwqe,DC=hortonworks,DC=com' -z 5 member
# extended LDIF
# LDAPv3
# base <OU=groups,DC=hwqe,DC=hortonworks,DC=com> with scope subtree
# filter: (objectclass=*)
# requesting: member

# groups,
dn: OU=groups,DC=hwqe,DC=hortonworks,DC=com

# scientist, groups,
dn: CN=scientist,OU=groups,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl2,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=bob,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam,CN=Users,DC=hwqe,DC=hortonworks,DC=com

# analyst, groups,
dn: CN=analyst,OU=groups,DC=hwqe,DC=hortonworks,DC=com
member: CN=testLdap1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl2,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=bob,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=tom,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam,CN=Users,DC=hwqe,DC=hortonworks,DC=com

# knox_hdp_users, groups,
dn: CN=knox_hdp_users,OU=groups,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl2,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam repl,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam,CN=Users,DC=hwqe,DC=hortonworks,DC=com

# knox_no_users, groups,
dn: CN=knox_no_users,OU=groups,DC=hwqe,DC=hortonworks,DC=com

# test grp, groups,
dn: CN=test grp,OU=groups,DC=hwqe,DC=hortonworks,DC=com
member: CN=testLdap1,CN=Users,DC=hwqe,DC=hortonworks,DC=com
member: CN=sam,CN=Users,DC=hwqe,DC=hortonworks,DC=com

# search result
search: 2
result: 0 Success

# numResponses: 7
# numEntries: 6

Armed with this group information you can now create a topology file that causes the Shiro authentication provider to retrieve group information. Keep in mind that we haven’t made it all the way to authorization yet. This step is just to prove that your can get the group information back from ActiveDirectory. Once we have the group lookup working, we will enable authorization in the next step.

The table below highlights the changes that you will be making in this topology file.

main.ldapRealm.userSearchBaseReplaces main.ldapRealm.searchBaseCN=Users,DC=hwqe,DC=hortonworks,DC=com
main.ldapRealm.authorizationEnabledEnabled the group lookup functionality.true
main.ldapRealm.groupSearchBaseThe subset of groups to search for user membership.OU=groups,DC=hwqe,DC=hortonworks,DC=com
main.ldapRealm.groupObjectClassThe objectClass to limit the search
main.ldapRealm.groupIdAttributeThe attribute used to provide the group name.sAMAccountName
main.ldapRealm.memberAttributeThe attribute used to provide the group’s members.member


Create this topology file file now.



            <param name="main.ldapRealm" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm"/>
            <param name="main.ldapContextFactory" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory"/>
            <param name="main.ldapRealm.contextFactory" value="$ldapContextFactory"/>

            <param name="main.ldapRealm.contextFactory.url" value="ldap://"/>
            <param name="main.ldapRealm.contextFactory.systemUsername" value="CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.contextFactory.systemPassword" value="********"/>

            <param name="main.ldapRealm.userSearchBase" value="CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.userSearchAttributeName" value="sAMAccountName"/>
            <param name="main.ldapRealm.userObjectClass" value="person"/>

            <param name="main.ldapRealm.authorizationEnabled" value="true"/>
            <param name="main.ldapRealm.groupSearchBase" value="OU=groups,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.groupObjectClass" value="group"/>
            <param name="main.ldapRealm.groupIdAttribute" value="sAMAccountName"/>
            <param name="main.ldapRealm.memberAttribute" value="member"/>

            <param name="urls./**" value="authcBasic"/>


Once again the Knox tooling can be used to test this configuration. This time the --g flag will be added to retrieve group information.

bin/ user-auth-test --cluster sample4 --u sam --p '********' --g
LDAP authentication successful!
sam is a member of: analyst
sam is a member of: knox_hdp_users
sam is a member of: test grp
sam is a member of: scientist

Sample 5

The next sample adds in an authorization provider to act upon the groups. This is the same provider that was added back in the second sample. The parameter <param name="knox.acl" value="*;knox_hdp_users;*"/> in this case dictates that only members of group knox_hdp_users can addess the Knox Admin API via the sample5 topology. Create the topology shown below. Don’t forget to tailor it to your environment.



            <param name="main.ldapRealm" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm"/>
            <param name="main.ldapContextFactory" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory"/>
            <param name="main.ldapRealm.contextFactory" value="$ldapContextFactory"/>
            <param name="main.ldapRealm.contextFactory.url" value="ldap://"/>
            <param name="main.ldapRealm.contextFactory.systemUsername" value="CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.contextFactory.systemPassword" value="********"/>
            <param name="main.ldapRealm.userSearchBase" value="CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.userSearchAttributeName" value="sAMAccountName"/>
            <param name="main.ldapRealm.userObjectClass" value="person"/>
            <param name="main.ldapRealm.authorizationEnabled" value="true"/>
            <param name="main.ldapRealm.groupSearchBase" value="OU=groups,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.groupObjectClass" value="group"/>
            <param name="main.ldapRealm.groupIdAttribute" value="sAMAccountName"/>
            <param name="main.ldapRealm.memberAttribute" value="member"/>
            <param name="urls./**" value="authcBasic"/>

            <param name="knox.acl" value="*;knox_hdp_users;*"/>

curl -u kminder:'********' -ik 'https://localhost:8443/gateway/sample5/api/v1/version'
curl -u sam:'********' -ik 'https://localhost:8443/gateway/sample5/api/v1/version'

Sample 6

Next lets enable caching because out of the box this important performance enhancement isn’t enabled. The table below hilights the changes that will be made to the authentication provider settings.

main.cacheManagerThe name of the class implementing the
main.securityManager.cacheManagerSets the cache manager on the security manager.$cacheManager
main.ldapRealm.authenticationCachingEnabledEnabled the use of caching during authentication.true


Create the sample6 topology file now.



            <param name="main.ldapRealm" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm"/>
            <param name="main.ldapContextFactory" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory"/>
            <param name="main.ldapRealm.contextFactory" value="$ldapContextFactory"/>
            <param name="main.ldapRealm.contextFactory.url" value="ldap://"/>
            <param name="main.ldapRealm.contextFactory.systemUsername" value="CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.contextFactory.systemPassword" value="********"/>
            <param name="main.ldapRealm.userSearchBase" value="CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.userSearchAttributeName" value="sAMAccountName"/>
            <param name="main.ldapRealm.userObjectClass" value="person"/>
            <param name="main.ldapRealm.authorizationEnabled" value="true"/>
            <param name="main.ldapRealm.groupSearchBase" value="OU=groups,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.groupObjectClass" value="group"/>
            <param name="main.ldapRealm.groupIdAttribute" value="sAMAccountName"/>
            <param name="main.ldapRealm.memberAttribute" value="member"/>

            <param name="main.cacheManager" value="org.apache.shiro.cache.ehcache.EhCacheManager"/>
            <param name="main.securityManager.cacheManager" value="$cacheManager"/>
            <param name="main.ldapRealm.authenticationCachingEnabled" value="true"/>

            <param name="urls./**" value="authcBasic"/>

            <param name="knox.acl" value="*;knox_hdp_users;*"/>


With this topology file you can execute a sequence of cURL commands to demonstrate that the authentication is indeed cached.

curl -u sam:'********' -ik 'https://localhost:8443/gateway/sample6/api/v1/version'

Now unplug your network cable, turn off Wifi or disconnect from VPN. The intent being to temporarily prevent access to the ActiveDirectory server. The command below will continue to work even though no cookies are used and the ActiveDirectory server cannot be contacted. This is because the invocation above caused the user’s authentication and authorization information to be cached.

curl -u sam:'********' -ik 'https://localhost:8443/gateway/sample6/api/v1/version'

The command below uses and invalid password and is intended to prove that the previously authenticated credentials are re-verified. It is important to note that Knox does not store the actual password in the cache for this verification but rather a one way hash of the password.

curl -u sam:'invalid-password' -ik 'https://localhost:8443/gateway/sample6/api/v1/version'

Sample 7

Finally lets put it all together in a real topology file that doesn’t use the Knox Admin API. The important things to observe here are:

  1. the host and ports for the Hadoop services will need to be changed to match your environment
  2. the inclusion of the Hadoop services instead of the Knox Admin API
  3. the inclusion of the identity-assertion provider
  4. the exclusion of the hostmap provider as this is rarely required unless running Hadoop on local VMs with port mapping

Create the final sample7 topology file.



            <param name="main.ldapRealm" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm"/>
            <param name="main.ldapContextFactory" value="org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory"/>
            <param name="main.ldapRealm.contextFactory" value="$ldapContextFactory"/>

            <param name="main.ldapRealm.contextFactory.url" value="ldap://"/>
            <param name="main.ldapRealm.contextFactory.systemUsername" value="CN=Kevin Minder,CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.contextFactory.systemPassword" value="********"/>

            <param name="main.ldapRealm.userSearchBase" value="CN=Users,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.userSearchAttributeName" value="sAMAccountName"/>
            <param name="main.ldapRealm.userObjectClass" value="person"/>

            <param name="main.ldapRealm.authorizationEnabled" value="true"/>
            <param name="main.ldapRealm.groupSearchBase" value="OU=groups,DC=hwqe,DC=hortonworks,DC=com"/>
            <param name="main.ldapRealm.groupObjectClass" value="group"/>
            <param name="main.ldapRealm.groupIdAttribute" value="sAMAccountName"/>
            <param name="main.ldapRealm.memberAttribute" value="member"/>

            <param name="main.cacheManager" value="org.apache.shiro.cache.ehcache.EhCacheManager"/>
            <param name="main.securityManager.cacheManager" value="$cacheManager"/>
            <param name="main.ldapRealm.authenticationCachingEnabled" value="true"/>

            <param name="urls./**" value="authcBasic"/>

            <param name="knox.acl" value="*;knox_hdp_users;*"/>












To verify topology files we frequently use the WebHDFS GETHOMEDIRECTORY command.

curl -ku guest:guest-password 'https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=GETHOMEDIRECTORY' 

This should return a response body similar to what is shown below.

{"Path": "/user/guest"}

Hopefully this provides a more targeted and useful example of using Apache Knox with ActiveDirectory than can be provided in theApache Knox User’s Guide. If you have more questions, comments or suggestions please join the Apache Knox community. In particular you might be interested in one of the mailing lists.

This article covers setting up Apache Knox for development or just to play around with.

Step 1 - Clone the git repository

~/Projects> git clone

Step 2 - Build, install and start the servers

~/Projects> cd knox
~/Projects/knox> ant package install-test-home start-test-servers

This will generate a great deal of output. At the end though you should see something like this. If not, I’ve included some debugging tips later below.

     [exec] Starting LDAP succeeded with PID 18226.

     [exec] Starting Gateway succeeded with PID 18277.

Assuming that the started successfully you can access the Knox Admin API via cURL.

~/Projects/knox> curl -ku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/version'

This will return an XML response with some version information.

<?xml version="1.0" encoding="UTF-8"?>

If the servers failed to start, here are some debugging tips and tricks.

The first thing to check for is other running gateway or ldap servers. The Java jps command is convenient for doing this. If you find other gateway.jar or ldap.jar processes running this is likely causing the issue. These will need to be stopped before you can proceed.

~/Projects/knox> jps
431 Launcher
18277 gateway.jar
18346 Jps
18226 ldap.jar

The next likely culprit is some other process running using port required by the gateway (8443) or the demo LDAP server (33389). On macos the lsof command is the tool of choice. If you find other processes already listening on these ports they will need to be stopped before you can proceed.

~/Projects/knox> lsof -n -i4TCP:8443 | grep LISTEN
java    18277 kevin.minder  167u  IPv6 0x2d785ee90129816b      0t0  TCP *:pcsync-https (LISTEN)

~/Projects/knox> lsof -n -i4TCP:33389 | grep LISTEN
java    18226 kevin.minder  226u  IPv6 0x2d785ee91fcce56b      0t0  TCP *:33389 (LISTEN)

Step 3 - Customize the topology for your cluster

Once the Knox servers are up and running you need to create or customize topology files match an existing Hadoop cluster. Please note that your directories may be different than what is shown below depending on what version of Knox you are using. The version shown here is the 0.7.0-SNAPSHOT version. Also note that the open command is a macos specific command that will likely launch the XML file in Xcode for editing. Any text editor is fine.

~/Projects/knox> cd install/knox-0.7.0-SNAPSHOT
~/Projects/knox/install/knox-0.7.0-SNAPSHOT> open conf/topologies/sandbox.xml

Right now all you need to worry about are the <service> sections in the topology file, in particular the <url> values. If you are running a local HDP Sandbox these values will be correct, otherwise they will need to be changed.









Once you have made the required changes to the <service> elements save the file. Within a few seconds the Knox gateway server will detect the change and reload the file. Then you can access the Hadoop cluster via the gateway with the sample cURL command below.

curl -ku guest:guest-password 'https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=GETHOMEDIRECTORY' 

This should return a response body similar to what is shown below.

{"Path": "/user/guest"}

Hopefully this provides the shortest possible path to getting started with Apache Knox. Most of this information can also be found in the Apache Knox User’s Guide. If you have more questions, comments or suggestions please join the Apache Knox community. In particular you might be interested in one of the mailing lists.