Blog from November, 2016

 

 

(This article is work in progress)

Apache Knox has always had LDAP based authentication through the Apache Shiro authentication provider which makes the configuration a bit easier and flexible. However there are a number of limitations with the KnoxLdapRealm (KNOX-536), for instance only a single Organizational Unit (OU) is currently supported. Group lookup will not return the groups that are defined within the tree structure below that single OU. Also, group memberships that are indirectly defined through membership in a group that is itself a member of another group are not resolved. In Apache Knox 0.10.0 Knox introduced the ability to leverage the Linux PAM authentication mechanism. KNOX-537 added a KnoxPAMRealm to the Shiro provider for PAM support. This blog post discusses how to set up LDAP using the new PAM support provided by Knox with Linux SSSD daemon  and some of the advantages and key features of SSSD.

Some of the advantages of using this are:

  • Supported for nested OUs and nested groups

  • Faster lookups

  • Support more complex LDAP queries

  • Reduce load on the LDAP/AD server (caching by SSSD)

Scenarios

There are two scenarios that were tested

  • Nested groups
  • Nested OUs
  • Using Multiple Search Bases

Nested Groups

Following diagram represents a nested groups structure used for testing

 

 

In the above diagram we have OU=data which has multiple nested groups (2 levels) and we have a user 'jerry' who belongs to the final group datascience-b explicitly, but implicitly belongs to all the other groups that nest it (i.e. datascience-a and datascience)

When SSSD is properly configured (as explained later in the post) we get the following results

# id -a jerry
uid=4001(jerry) gid=4000(engineer) groups=4000(engineer),5000(datascientist),6000(datascientist-a),7000(datascientist-b)

When we try to access a resource secured by Knox using the user jerry we can see all the groups that user jerry belongs to are logged in gateway-audit.log (part of Knox logging)

Groups: [datascientist-a, datascientist-b, engineer, datascientist]

Nested OUs

Following diagram shows the nested OU structure used for testing

 

In this example we can see that the user kim is part of group 'processors' which is part of OU processing which is part of OU data which in turn is part of OU groups.

Following is the output of 'id' command, here we can see that our user kim and group that user belongs to are retrieved correctly.

# id -a kim
uid=8001(kim) gid=8000(processors) groups=8000(processors)

Similarly, when we try to access a resource secured by Knox using the user kim we get the following entry in gateway-audit.log (part of Knox logging)

Groups: [processors]

This demonstrates that Knox can authenticate and retrieve groups against nested OUs.

Using Multiple Search Bases

Following diagram shows nested parallel OUs (processing and processing-2)

 

In this test we will configure two different search bases 

  • ou=processing,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org
  • ou=processing-2,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org

sssd.conf settings (relevant) for this test are as follows:

[sssd]
....
domains = default, processing2
....

[domain/default]
....
ldap_search_base = ou=processing,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org
....

[domain/processing2]
....
ldap_search_base = ou=processing-2,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org
....

To check whether SSSD correctly picks up our users we use the id command

# id kim
uid=8001(kim) gid=8000(processors) groups=8000(processors)

# id jon
uid=9001(jon) gid=9000(processors-2) groups=9000(processors-2)

Similarly, when we try to access a resource secured by Knox using the user kim and jon we get the following entry in gateway-audit.log (part of Knox logging)

for kim
success|Groups: [processors]

for jon
success|Groups: [processors-2]

Also, if you take out 'processing2' service from sssd.conf file and restart sssd, user 'jon' will not be found but 'kim' can still be found:

# id jon
id: 'jon': no such user
# id kim
uid=8001(kim) gid=8000(processors) groups=8000(processors)

Thanks to Eric Yang for pointing out this scenario.

Setup Overview

Following diagram shows a high level set-up of the components involved.

 

 

 

Following are the component versions for this test

  • OpenLDAP - 2.4.40
  • SSSD - 1.14.1
  • Apache Knox - 0.10.0

LDAP

In order to support nesting of groups LDAP needs to support RFC 2307bis schema. For SSSD to talk to LDAP it has to be secure. Acquire a copy of the public CA certificate for the certificate authority used to sign the LDAP server certificate, you can test the certificate using the following openssl test command

openssl s_client -connect <ldap_host>:<ldap_port> -showcerts -state -CAfile <path_to_ca_directory>/cacert.pem

SSSD

SSSD is stricter than pam_ldap. In order to perform an authentication, SSSD requires that the communication channel be encrypted. This means that if sssd.conf has ldap_uri = ldap://<server>, it will attempt to encrypt the communication channel with TLS (transport layer security). If sssd.conf has ldap_uri = ldaps://<server>, then SSL will be used instead of TLS. This requires that the LDAP server

  1. Supports TLS or SSL
  2. Has TLS access enabled on the standard LDAP port (636) (or alternate port, if specified in the ldap_uri or has SSL access enabled on the standard LDAPS port (or alternate port).
  3. Has a valid certificate trust (can be relaxed by using ldap_tls_reqcert = never,  but it is a security risk and should ONLY be done for development and demos)

Copy the public CA certs needed to talk to LDAP at  /etc/openldap/certs

To configure sssd you can use the following 'authconfig' command

authconfig --enablesssd --enablesssdauth --enablelocauthorize --enableldap --enableldapauth --ldapserver=<ldap_host> --enableldaptls --ldapbasedn=dc=my-company,dc=my-org --enableshadow --enablerfc2307bis --enablemkhomedir --enablecachecreds --update

After the command executes you can see that sssd.conf file has been updated.

An example of sssd.conf file

[sssd]
config_file_version = 2
reconnection_retries = 3
sbus_timeout = 30
services = nss, pam, autofs
domains = default

[nss]
reconnection_retries = 3
homedir_substring = /home

[pam]
reconnection_retries = 3

[domain/default]
access_provider = ldap
autofs_provider = ldap
chpass_provider = ldap
cache_credentials = True
ldap_schema = rfc2307bis

id_provider = ldap
auth_provider = ldap
ldap_uri = ldap://<ldap_host>/

ldap_tls_cacertdir = /etc/openldap/certs
ldap_id_use_start_tls = True

# default bind dn
ldap_default_bind_dn = cn=admin,dc=apache,dc=org
ldap_default_authtok_type = password
ldap_default_authtok = my_pasword
ldap_search_base = dc=apache,dc=org

# For group lookup
ldap_group_member = member

# Enable nesting 
ldap_group_nesting_level = 5

[sudo]

[autofs]

[ssh]

[pac]

[ifp]

The important settings to note are:

  • ldap_schema = rfc2307bis - Needed if all groups are to be returned when using nested groups or primary/secondary groups.
  • ldap_tls_cacertdir = /etc/openldap/certs - certs to talk to LDAP server
  • ldap_id_use_start_tls = True - Secure communication with LDAP
  • ldap_group_nesting_level = 5 - Enable group nesting up-to 5 levels

NOTE: You might need to add / change some options in sssd.conf file to suite your needs. like debug level etc. After updating just restart the service and changes should be reflected.

Some additional settings that can be used to control caching of credentials by SSSD are

   
cache_credentialsBooleanOptional. Specifies whether to store user credentials in the local SSSD domain database cache. The default value for this parameter is false. Set this value to true for domains other than the LOCAL domain to enable offline authentication.
entry_cache_timeoutintegerOptional. Specifies how long, in seconds, SSSD should cache positive cache hits. A positive cache hit is a successful query.

Test SSSD is configuration

To check whether SSSD is configured correctly you can use the standard 'getent' or 'id' commands

$ getent passwd <ldap_user>
$ id -a <ldap_user>

Using the above commands you should be able to see all the groups that <ldap_user> belongs to. If you do not see the secondary groups check the 'ldap_group_nesting_level = 5' option and adjust it accordingly.

Knox

Setting up Knox is relatively easy, install Knox on the same machine as SSSD and update the topology to use PAM based auth

			<param>
                <name>main.pamRealm</name> 
                <value>org.apache.hadoop.gateway.shirorealm.KnoxPamRealm</value>
            </param>
            <param>
                <name>main.pamRealm.service</name> 
                <value>login</value>
            </param>

For more information and explanation on setting up Knox see the PAM Based Authentication section in Knox user guide.

Caveats

  • For nested group membership SSSD and LDAP should use rfc2307bis schema

  • SSSD requires SSL/TLS to talk to LDAP

Troubleshooting

 

 

 

Apache KNOX provides a single gateway to many services in your Hadoop cluster. You can leverage the KNOX shell DSL interface to interact with services such as WebHdfs, WebHCat (Templeton), Oozie, HBase, etc. For example, using groovy and DSL you can submit Hive queries via WebHCat (Templeton) as simple as:

println "[Hive.groovy] Copy Hive query file to HDFS"
Hdfs.put(session).text( hive_query ).to( jobDir + "/input/query.hive" ).now()

jobId = Job.submitHive(session) \
            .file("${jobDir}/input/query.hive") \
            .arg("-v").arg("--hiveconf").arg("TABLE_NAME=${tmpTableName}") \
            .statusDir("${jobDir}/output") \
            .now().jobId

submitSqoop Job API

With version of Apache KNOX 0.10.0, you can now write application using KNOX DSL for Apache SQOOP and easily submit SQOOP jobs. The WebHCAT Job class in DSL language now supports submitSqoop() as follow:

Job.submitSqoop(session)
    .command("import --connect jdbc:mysql://hostname:3306/dbname ... ")
    .statusDir(remoteStatusDir)
    .now().jobId

submitSqoop Request takes the following arguments:

  • command (String) - The sqoop command string to execute.
  • files (String) - Comma separated files to be copied to the templeton controller job.
  • optionsfile (String) - The remote file which contain Sqoop command need to run.
  • libdir (String) - The remote directory containing jdbc jar to include with sqoop lib
  • statusDir (String) - The remote directory to store status output.

which will return jobId as Response.

Simple example

In this example we will run a simple sqoop job to extract scBlastTab table to HFDS from the public genome database (mySQL) at UCSC.

First, import the following packages:

import com.jayway.jsonpath.JsonPath
import groovy.json.JsonSlurper
import org.apache.hadoop.gateway.shell.Hadoop
import org.apache.hadoop.gateway.shell.hdfs.Hdfs
import org.apache.hadoop.gateway.shell.job.Job
import static java.util.concurrent.TimeUnit.SECONDS

Next, establish connection to KNOX gateway with Hadoop.login:

// Get gatewayUrl and credentials from environment
def env = System.getenv()
gatewayUrl = env.gateway
username = env.username
password = env.password

jobDir = "/user/" + username + "/sqoop"

session = Hadoop.login( gatewayUrl, username, password )
 
println "[Sqoop.groovy] Delete " + jobDir + ": " + Hdfs.rm( session ).file( jobDir ).recursive().now().statusCode
println "[Sqoop.groovy] Mkdir " + jobDir + ": " + Hdfs.mkdir( session ).dir( jobDir ).now().statusCode

Define your SQOOP job (assuming SQOOP is already configured with mySql driver already):

// Database connection information

db = [ driver:"com.mysql.jdbc.Driver", url:"jdbc:mysql://genome-mysql.cse.ucsc.edu/hg38", user:"genome", password:"", name:"hg38", table:"scBlastTab", split:"query" ]

targetdir = jobDir + "/" + db.table

sqoop_command = "import --driver ${db.driver} --connect ${db.url} --username ${db.user} --password ${db.password} --table ${db.table} --split-by ${db.split} --target-dir ${targetdir}"

You can now submit the sqoop_command to the cluster with submitSqoop:

jobId = Job.submitSqoop(session) \
            .command(sqoop_command) \
            .statusDir("${jobDir}/output") \
            .now().jobId

println "[Sqoop.groovy] Submitted job: " + jobId

You can then check job status and output as usual:

println "[Sqoop.groovy] Polling up to 60s for job completion..."

done = false
count = 0
while( !done && count++ < 60 ) {
  sleep( 1000 )
  json = Job.queryStatus(session).jobId(jobId).now().string
  done = JsonPath.read( json, "\$.status.jobComplete" )
  print "."; System.out.flush();
}
println ""
println "[Sqoop.groovy] Job status: " + done

// Check output directory
text = Hdfs.ls( session ).dir( jobDir + "/output" ).now().string
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

println "\n[Sqoop.groovy] Content of stderr:"
println Hdfs.get( session ).from( jobDir + "/output/stderr" ).now().string

// Check table files
text = Hdfs.ls( session ).dir( jobDir + "/" + db.table ).now().string
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix

session.shutdown()

 

Here is sample output of the above example against Hadoop cluster. You need to have properly configured Hadoop cluster with Apache KNOX gateway, Apache Sqoop and WebHcat (Templeton). Test was ran against BigInsights Hadoop cluster.

:compileJava UP-TO-DATE
:compileGroovy
:processResources UP-TO-DATE
:classes
:Sqoop

[Sqoop.groovy] Delete /user/biadmin/sqoop: 200
[Sqoop.groovy] Mkdir /user/biadmin/sqoop: 200
[Sqoop.groovy] Submitted job: job_1476266127941_0692
[Sqoop.groovy] Polling up to 60s for job completion...
............................................
[Sqoop.groovy] Job status: true
[exit, stderr, stdout]

[Sqoop.groovy] Content of stderr:
log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found.
16/11/03 16:53:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6_IBM_27
16/11/03 16:53:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/11/03 16:53:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
16/11/03 16:53:06 INFO manager.SqlManager: Using default fetchSize of 1000
16/11/03 16:53:06 INFO tool.CodeGenTool: Beginning code generation
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/iop/4.2.0.0/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/iop/4.2.0.0/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/11/03 16:53:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0
16/11/03 16:53:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0
16/11/03 16:53:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/iop/4.2.0.0/hadoop-mapreduce
Note: /tmp/sqoop-biadmin/compile/4432005ab10742f26cc82d5438497cae/scBlastTab.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/11/03 16:53:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-biadmin/compile/4432005ab10742f26cc82d5438497cae/scBlastTab.jar
16/11/03 16:53:09 INFO mapreduce.ImportJobBase: Beginning import of scBlastTab
16/11/03 16:53:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/11/03 16:53:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0
16/11/03 16:53:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/11/03 16:53:10 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
16/11/03 16:53:15 INFO db.DBInputFormat: Using read commited transaction isolation
16/11/03 16:53:15 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(query), MAX(query) FROM scBlastTab
16/11/03 16:53:16 WARN db.TextSplitter: Generating splits for a textual index column.
16/11/03 16:53:16 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/11/03 16:53:16 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: number of splits:5
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476266127941_0693
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: mapreduce.job, Service: job_1476266127941_0692, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6fbb4061)
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ehaascluster, Ident: (HDFS_DELEGATION_TOKEN token 4660 for biadmin)
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: RM_DELEGATION_TOKEN, Service: 172.16.222.2:8032,172.16.222.3:8032, Ident: (owner=biadmin, renewer=mr token, realUser=HTTP/bicloud-fyre-physical-17-master-3.fyre.ibm.com@IBM.COM, issueDate=1478191971063, maxDate=1478796771063, sequenceNumber=67, masterKeyId=66)
16/11/03 16:53:16 WARN token.Token: Cannot find class for token kind kms-dt
16/11/03 16:53:16 WARN token.Token: Cannot find class for token kind kms-dt
Kind: kms-dt, Service: 172.16.222.1:16000, Ident: 00 07 62 69 61 64 6d 69 6e 04 79 61 72 6e 05 68 62 61 73 65 8a 01 58 2b 1b 7b 34 8a 01 58 4f 27 ff 34 8e 03 a4 09
16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: MR_DELEGATION_TOKEN, Service: 172.16.222.3:10020, Ident: (owner=biadmin, renewer=yarn, realUser=HTTP/bicloud-fyre-physical-17-master-3.fyre.ibm.com@IBM.COM, issueDate=1478191972979, maxDate=1478796772979, sequenceNumber=52, masterKeyId=49)
16/11/03 16:53:17 INFO impl.YarnClientImpl: Submitted application application_1476266127941_0693
16/11/03 16:53:17 INFO mapreduce.Job: The url to track the job: http://bicloud-fyre-physical-17-master-2.fyre.ibm.com:8088/proxy/application_1476266127941_0693/
16/11/03 16:53:17 INFO mapreduce.Job: Running job: job_1476266127941_0693
16/11/03 16:53:24 INFO mapreduce.Job: Job job_1476266127941_0693 running in uber mode : false
16/11/03 16:53:24 INFO mapreduce.Job:  map 0% reduce 0%
16/11/03 16:53:32 INFO mapreduce.Job:  map 20% reduce 0%
16/11/03 16:53:33 INFO mapreduce.Job:  map 100% reduce 0%
16/11/03 16:53:34 INFO mapreduce.Job: Job job_1476266127941_0693 completed successfully
16/11/03 16:53:34 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=799000
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=644
		HDFS: Number of bytes written=148247
		HDFS: Number of read operations=20
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=10
	Job Counters 
		Launched map tasks=5
		Other local map tasks=5
		Total time spent by all maps in occupied slots (ms)=62016
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=31008
		Total vcore-milliseconds taken by all map tasks=31008
		Total megabyte-milliseconds taken by all map tasks=190513152
	Map-Reduce Framework
		Map input records=2379
		Map output records=2379
		Input split bytes=644
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=249
		CPU time spent (ms)=6590
		Physical memory (bytes) snapshot=1758576640
		Virtual memory (bytes) snapshot=35233165312
		Total committed heap usage (bytes)=2638741504
	Fit Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=148247
16/11/03 16:53:34 INFO mapreduce.ImportJobBase: Transferred 144.7725 KB in 23.9493 seconds (6.0449 KB/sec)
16/11/03 16:53:34 INFO mapreduce.ImportJobBase: Retrieved 2379 records.

[_SUCCESS, part-m-00000, part-m-00001, part-m-00002, part-m-00003, part-m-00004]

BUILD SUCCESSFUL
Total time: 1 mins 2.202 secs

From output above you can see the job output as well as the content of the table directory on HDFS which contains 5 parts (used 5 map tasks). WebHcat (Templeton) job console output will go to stderr in this case.

As part of compiling/running your code ensure you have the following dependency: org.apache.knox:gateway-shell:0.10.0.