Hadoop Auth is a Java library which enables Kerberos SPNEGO authentication for HTTP requests. It enforces authentication on protected resources, after successful authentication Hadoop Auth creates a signed HTTP Cookie with an authentication token, username, user principal, authentication type and expiration time. This cookie is used for all subsequent HTTP client requests to access a protected resource until the cookie expires.
Given Apache Knox's pluggable authentication providers it is easy to setup Hadoop Auth with Apache Knox with only few configuration changes. The purpose of this article to describe this process in detail and with examples.
Assumptions
Here we are assuming that we have a working Hadoop cluster with Apache Knox ( version 0.7.0 and up ) moreover the cluster is Kerberized. Kerberizing the cluster is beyond the scope of this article.
Setup
To use Hadoop Auth in Apache Knox we need to update the Knox topology. Hadoop Auth is configured as a provider so we need to configure it through the provider params. Apache Knox uses the same configuration parameters used by Apache Hadoop and they can be expected to behave in similar fashion. To update the Knox topology using Ambari go to Knox -> Configs -> Advanced topology.
Following is an example of the HadoopAuth provider snippet in the Apache Knox topology file
<provider> <role>authentication</role> <name>HadoopAuth</name> <enabled>true</enabled> <param> <name>config.prefix</name> <value>hadoop.auth.config</value> </param> <param> <name>hadoop.auth.config.signature.secret</name> <value>my-seceret-key</value> </param> <param> <name>hadoop.auth.config.type</name> <value>kerberos</value> </param> <param> <name>hadoop.auth.config.simple.anonymous.allowed</name> <value>false</value> </param> <param> <name>hadoop.auth.config.token.validity</name> <value>1800</value> </param> <param> <name>hadoop.auth.config.cookie.domain</name> <value>ambari.apache.org</value> </param> <param> <name>hadoop.auth.config.cookie.path</name> <value>gateway/default</value> </param> <param> <name>hadoop.auth.config.kerberos.principal</name> <value>HTTP/c6401.ambari.apache.org@EXAMPLE.COM</value> </param> <param> <name>hadoop.auth.config.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </param> <param> <name>hadoop.auth.config.kerberos.name.rules</name> <value>DEFAULT</value> </param> </provider>
Following are the parameters that needs to be updated at minimum:
- hadoop.auth.config.signature.secret - This is the secret used to sign the delegation token in the hadoop.auth cookie. This same secret needs to be used across all instances of the Knox gateway in a given cluster. Otherwise, the delegation token will fail validation and authentication will be repeated each request.
- cookie.domain - domain to use for the HTTP cookie that stores the authentication token (e.g. mycompany.com)
- hadoop.auth.config.kerberos.principal - The web-application Kerberos principal name. The Kerberos principal name must start with HTTP/...
- hadoop.auth.config.kerberos.keytab - The path to the keytab file containing the credentials for the kerberos principal specified above.
For details on the other properties please refer to the Apache Knox documentation.
If you are using Ambari you will have to restart Knox, this is an Ambari requirement, no restart is required if topology is updated outside of Ambari (Apache Knox reloads the topology every time the topology time-stamp is updated).
Testing
For testing Hadoop Auth we will test with user 'guest', we are assuming that no such user exists on the system.
Let's create a user 'guest' with group 'users'. Note that the group users was chosen because of the property 'hadoop.proxyuser.knox.groups=users'
useradd guest -u 1590 -g users
Add principal using 'kadmin.local'
kadmin.local -q "addprinc guest/c6401.ambari.apache.org”
Login using kinit
kinit guest/c6401.ambari.apache.org@EXAMPLE.COM
Test by sending a curl request through Knox
curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS”
You should see output similar to
# curl -k -i --negotiate -u : "https://c6401.ambari.apache.org:8443/gateway/default/webhdfs/v1/tmp?op=LISTSTATUS" HTTP/1.1 401 Authentication required Date: Fri, 24 Feb 2017 14:19:25 GMT WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly Content-Type: text/html; charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 320 Server: Jetty(9.2.15.v20160210) HTTP/1.1 200 OK Date: Fri, 24 Feb 2017 14:19:25 GMT Set-Cookie: hadoop.auth="u=guest&p=guest/c6401.ambari.apache.org@EXAMPLE.COM&t=kerberos&e=1487947765114&s=fNpq9FYy2DA19Rah7586rgsAieI="; Path=gateway/default; Domain=ambari.apache.org; Secure; HttpOnly Cache-Control: no-cache Expires: Fri, 24 Feb 2017 14:19:25 GMT Date: Fri, 24 Feb 2017 14:19:25 GMT Pragma: no-cache Expires: Fri, 24 Feb 2017 14:19:25 GMT Date: Fri, 24 Feb 2017 14:19:25 GMT Pragma: no-cache Content-Type: application/json; charset=UTF-8 X-FRAME-OPTIONS: SAMEORIGIN Server: Jetty(6.1.26.hwx) Content-Length: 276 {"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16398,"group":"hdfs","length":0,"modificationTime":1487855904191,"owner":"hdfs","pathSuffix":"entity-file-history","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}}