Hadoop Auth is a Java library which enables Kerberos SPNEGO authentication for HTTP requests. It enforces authentication on protected resources, after successful authentication Hadoop Auth creates a signed HTTP Cookie with an authentication token, username, user principal, authentication type and expiration time. This cookie is used for all subsequent HTTP client requests to access a protected resource until the cookie expires.
Given Apache Knox's pluggable authentication providers it is easy to setup Hadoop Auth with Apache Knox with only few configuration changes. The purpose of this article to describe this process in detail and with examples.
Here we are assuming that we have a working Hadoop cluster with Apache Knox ( version 0.7.0 and up ) moreover the cluster is Kerberized. Kerberizing the cluster is beyond the scope of this article.
To use Hadoop Auth in Apache Knox we need to update the Knox topology. Hadoop Auth is configured as a provider so we need to configure it through the provider params. Apache Knox uses the same configuration parameters used by Apache Hadoop and they can be expected to behave in similar fashion. To update the Knox topology using Ambari go to Knox -> Configs -> Advanced topology.
Following is an example of the HadoopAuth provider snippet in the Apache Knox topology file
Following are the parameters that needs to be updated at minimum:
- hadoop.auth.config.signature.secret - This is the secret used to sign the delegation token in the hadoop.auth cookie. This same secret needs to be used across all instances of the Knox gateway in a given cluster. Otherwise, the delegation token will fail validation and authentication will be repeated each request.
- cookie.domain - domain to use for the HTTP cookie that stores the authentication token (e.g. mycompany.com)
- hadoop.auth.config.kerberos.principal - The web-application Kerberos principal name. The Kerberos principal name must start with HTTP/...
- hadoop.auth.config.kerberos.keytab - The path to the keytab file containing the credentials for the kerberos principal specified above.
For details on the other properties please refer to the Apache Knox documentation.
If you are using Ambari you will have to restart Knox, this is an Ambari requirement, no restart is required if topology is updated outside of Ambari (Apache Knox reloads the topology every time the topology time-stamp is updated).
For testing Hadoop Auth we will test with user 'guest', we are assuming that no such user exists on the system.
Let's create a user 'guest' with group 'users'. Note that the group users was chosen because of the property 'hadoop.proxyuser.knox.groups=users'
Add principal using 'kadmin.local'
Login using kinit
Test by sending a curl request through Knox
You should see output similar to