(This article is work in progress)
Apache Knox has always had LDAP based authentication through the Apache Shiro authentication provider which makes the configuration a bit easier and flexible. However there are a number of limitations with the KnoxLdapRealm (KNOX-536), for instance only a single Organizational Unit (OU) is currently supported. Group lookup will not return the groups that are defined within the tree structure below that single OU. Also, group memberships that are indirectly defined through membership in a group that is itself a member of another group are not resolved. In Apache Knox 0.10.0 Knox introduced the ability to leverage the Linux PAM authentication mechanism. KNOX-537 added a KnoxPAMRealm to the Shiro provider for PAM support. This blog post discusses how to set up LDAP using the new PAM support provided by Knox with Linux SSSD daemon and some of the advantages and key features of SSSD.
Some of the advantages of using this are:
Supported for nested OUs and nested groups
Faster lookups
Support more complex LDAP queries
Reduce load on the LDAP/AD server (caching by SSSD)
Scenarios
There are two scenarios that were tested
- Nested groups
- Nested OUs
- Using Multiple Search Bases
Nested Groups
Following diagram represents a nested groups structure used for testing
In the above diagram we have OU=data which has multiple nested groups (2 levels) and we have a user 'jerry' who belongs to the final group datascience-b explicitly, but implicitly belongs to all the other groups that nest it (i.e. datascience-a and datascience)
When SSSD is properly configured (as explained later in the post) we get the following results
# id -a jerry uid=4001(jerry) gid=4000(engineer) groups=4000(engineer),5000(datascientist),6000(datascientist-a),7000(datascientist-b)
When we try to access a resource secured by Knox using the user jerry we can see all the groups that user jerry belongs to are logged in gateway-audit.log (part of Knox logging)
Groups: [datascientist-a, datascientist-b, engineer, datascientist]
Nested OUs
Following diagram shows the nested OU structure used for testing
In this example we can see that the user kim is part of group 'processors' which is part of OU processing which is part of OU data which in turn is part of OU groups.
Following is the output of 'id' command, here we can see that our user kim and group that user belongs to are retrieved correctly.
# id -a kim uid=8001(kim) gid=8000(processors) groups=8000(processors)
Similarly, when we try to access a resource secured by Knox using the user kim we get the following entry in gateway-audit.log (part of Knox logging)
Groups: [processors]
This demonstrates that Knox can authenticate and retrieve groups against nested OUs.
Using Multiple Search Bases
Following diagram shows nested parallel OUs (processing and processing-2)
In this test we will configure two different search bases
- ou=processing,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org
- ou=processing-2,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org
sssd.conf settings (relevant) for this test are as follows:
[sssd] .... domains = default, processing2 .... [domain/default] .... ldap_search_base = ou=processing,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org .... [domain/processing2] .... ldap_search_base = ou=processing-2,ou=data,ou=groups,dc=hadoop,dc=apache,dc=org ....
To check whether SSSD correctly picks up our users we use the id command
# id kim uid=8001(kim) gid=8000(processors) groups=8000(processors) # id jon uid=9001(jon) gid=9000(processors-2) groups=9000(processors-2)
Similarly, when we try to access a resource secured by Knox using the user kim and jon we get the following entry in gateway-audit.log (part of Knox logging)
for kim success|Groups: [processors] for jon success|Groups: [processors-2]
Also, if you take out 'processing2' service from sssd.conf file and restart sssd, user 'jon' will not be found but 'kim' can still be found:
# id jon id: 'jon': no such user # id kim uid=8001(kim) gid=8000(processors) groups=8000(processors)
Thanks to Eric Yang for pointing out this scenario.
Setup Overview
Following diagram shows a high level set-up of the components involved.
Following are the component versions for this test
- OpenLDAP - 2.4.40
- SSSD - 1.14.1
- Apache Knox - 0.10.0
LDAP
In order to support nesting of groups LDAP needs to support RFC 2307bis schema. For SSSD to talk to LDAP it has to be secure. Acquire a copy of the public CA certificate for the certificate authority used to sign the LDAP server certificate, you can test the certificate using the following openssl test command
openssl s_client -connect <ldap_host>:<ldap_port> -showcerts -state -CAfile <path_to_ca_directory>/cacert.pem
SSSD
SSSD is stricter than pam_ldap. In order to perform an authentication, SSSD requires that the communication channel be encrypted. This means that if sssd.conf has ldap_uri = ldap://<server>, it will attempt to encrypt the communication channel with TLS (transport layer security). If sssd.conf has ldap_uri = ldaps://<server>, then SSL will be used instead of TLS. This requires that the LDAP server
- Supports TLS or SSL
- Has TLS access enabled on the standard LDAP port (636) (or alternate port, if specified in the ldap_uri or has SSL access enabled on the standard LDAPS port (or alternate port).
- Has a valid certificate trust (can be relaxed by using ldap_tls_reqcert = never, but it is a security risk and should ONLY be done for development and demos)
Copy the public CA certs needed to talk to LDAP at /etc/openldap/certs
To configure sssd you can use the following 'authconfig' command
authconfig --enablesssd --enablesssdauth --enablelocauthorize --enableldap --enableldapauth --ldapserver=<ldap_host> --enableldaptls --ldapbasedn=dc=my-company,dc=my-org --enableshadow --enablerfc2307bis --enablemkhomedir --enablecachecreds --update
After the command executes you can see that sssd.conf file has been updated.
An example of sssd.conf file
[sssd] config_file_version = 2 reconnection_retries = 3 sbus_timeout = 30 services = nss, pam, autofs domains = default [nss] reconnection_retries = 3 homedir_substring = /home [pam] reconnection_retries = 3 [domain/default] access_provider = ldap autofs_provider = ldap chpass_provider = ldap cache_credentials = True ldap_schema = rfc2307bis id_provider = ldap auth_provider = ldap ldap_uri = ldap://<ldap_host>/ ldap_tls_cacertdir = /etc/openldap/certs ldap_id_use_start_tls = True # default bind dn ldap_default_bind_dn = cn=admin,dc=apache,dc=org ldap_default_authtok_type = password ldap_default_authtok = my_pasword ldap_search_base = dc=apache,dc=org # For group lookup ldap_group_member = member # Enable nesting ldap_group_nesting_level = 5 [sudo] [autofs] [ssh] [pac] [ifp]
The important settings to note are:
- ldap_schema = rfc2307bis - Needed if all groups are to be returned when using nested groups or primary/secondary groups.
- ldap_tls_cacertdir = /etc/openldap/certs - certs to talk to LDAP server
- ldap_id_use_start_tls = True - Secure communication with LDAP
- ldap_group_nesting_level = 5 - Enable group nesting up-to 5 levels
NOTE: You might need to add / change some options in sssd.conf file to suite your needs. like debug level etc. After updating just restart the service and changes should be reflected.
Some additional settings that can be used to control caching of credentials by SSSD are
cache_credentials | Boolean | Optional. Specifies whether to store user credentials in the local SSSD domain database cache. The default value for this parameter is false . Set this value to true for domains other than the LOCAL domain to enable offline authentication. |
entry_cache_timeout | integer | Optional. Specifies how long, in seconds, SSSD should cache positive cache hits. A positive cache hit is a successful query. |
Test SSSD is configuration
To check whether SSSD is configured correctly you can use the standard 'getent' or 'id' commands
$ getent passwd <ldap_user> $ id -a <ldap_user>
Using the above commands you should be able to see all the groups that <ldap_user> belongs to. If you do not see the secondary groups check the 'ldap_group_nesting_level = 5' option and adjust it accordingly.
Knox
Setting up Knox is relatively easy, install Knox on the same machine as SSSD and update the topology to use PAM based auth
<param> <name>main.pamRealm</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxPamRealm</value> </param> <param> <name>main.pamRealm.service</name> <value>login</value> </param>
For more information and explanation on setting up Knox see the PAM Based Authentication section in Knox user guide.
Caveats
For nested group membership SSSD and LDAP should use rfc2307bis schema
SSSD requires SSL/TLS to talk to LDAP
Troubleshooting
Apache KNOX provides a single gateway to many services in your Hadoop cluster. You can leverage the KNOX shell DSL interface to interact with services such as WebHdfs, WebHCat (Templeton), Oozie, HBase, etc. For example, using groovy and DSL you can submit Hive queries via WebHCat (Templeton) as simple as:
println "[Hive.groovy] Copy Hive query file to HDFS" Hdfs.put(session).text( hive_query ).to( jobDir + "/input/query.hive" ).now() jobId = Job.submitHive(session) \ .file("${jobDir}/input/query.hive") \ .arg("-v").arg("--hiveconf").arg("TABLE_NAME=${tmpTableName}") \ .statusDir("${jobDir}/output") \ .now().jobId
submitSqoop Job API
With version of Apache KNOX 0.10.0, you can now write application using KNOX DSL for Apache SQOOP and easily submit SQOOP jobs. The WebHCAT Job class in DSL language now supports submitSqoop() as follow:
Job.submitSqoop(session) .command("import --connect jdbc:mysql://hostname:3306/dbname ... ") .statusDir(remoteStatusDir) .now().jobId
submitSqoop Request takes the following arguments:
- command (String) - The sqoop command string to execute.
- files (String) - Comma separated files to be copied to the templeton controller job.
- optionsfile (String) - The remote file which contain Sqoop command need to run.
- libdir (String) - The remote directory containing jdbc jar to include with sqoop lib
- statusDir (String) - The remote directory to store status output.
which will return jobId as Response.
Simple example
In this example we will run a simple sqoop job to extract scBlastTab table to HFDS from the public genome database (mySQL) at UCSC.
First, import the following packages:
import com.jayway.jsonpath.JsonPath import groovy.json.JsonSlurper import org.apache.hadoop.gateway.shell.Hadoop import org.apache.hadoop.gateway.shell.hdfs.Hdfs import org.apache.hadoop.gateway.shell.job.Job import static java.util.concurrent.TimeUnit.SECONDS
Next, establish connection to KNOX gateway with Hadoop.login:
// Get gatewayUrl and credentials from environment def env = System.getenv() gatewayUrl = env.gateway username = env.username password = env.password jobDir = "/user/" + username + "/sqoop" session = Hadoop.login( gatewayUrl, username, password ) println "[Sqoop.groovy] Delete " + jobDir + ": " + Hdfs.rm( session ).file( jobDir ).recursive().now().statusCode println "[Sqoop.groovy] Mkdir " + jobDir + ": " + Hdfs.mkdir( session ).dir( jobDir ).now().statusCode
Define your SQOOP job (assuming SQOOP is already configured with mySql driver already):
// Database connection information db = [ driver:"com.mysql.jdbc.Driver", url:"jdbc:mysql://genome-mysql.cse.ucsc.edu/hg38", user:"genome", password:"", name:"hg38", table:"scBlastTab", split:"query" ] targetdir = jobDir + "/" + db.table sqoop_command = "import --driver ${db.driver} --connect ${db.url} --username ${db.user} --password ${db.password} --table ${db.table} --split-by ${db.split} --target-dir ${targetdir}"
You can now submit the sqoop_command to the cluster with submitSqoop:
jobId = Job.submitSqoop(session) \ .command(sqoop_command) \ .statusDir("${jobDir}/output") \ .now().jobId println "[Sqoop.groovy] Submitted job: " + jobId
You can then check job status and output as usual:
println "[Sqoop.groovy] Polling up to 60s for job completion..." done = false count = 0 while( !done && count++ < 60 ) { sleep( 1000 ) json = Job.queryStatus(session).jobId(jobId).now().string done = JsonPath.read( json, "\$.status.jobComplete" ) print "."; System.out.flush(); } println "" println "[Sqoop.groovy] Job status: " + done // Check output directory text = Hdfs.ls( session ).dir( jobDir + "/output" ).now().string json = (new JsonSlurper()).parseText( text ) println json.FileStatuses.FileStatus.pathSuffix println "\n[Sqoop.groovy] Content of stderr:" println Hdfs.get( session ).from( jobDir + "/output/stderr" ).now().string // Check table files text = Hdfs.ls( session ).dir( jobDir + "/" + db.table ).now().string json = (new JsonSlurper()).parseText( text ) println json.FileStatuses.FileStatus.pathSuffix session.shutdown()
Here is sample output of the above example against Hadoop cluster. You need to have properly configured Hadoop cluster with Apache KNOX gateway, Apache Sqoop and WebHcat (Templeton). Test was ran against BigInsights Hadoop cluster.
:compileJava UP-TO-DATE :compileGroovy :processResources UP-TO-DATE :classes :Sqoop [Sqoop.groovy] Delete /user/biadmin/sqoop: 200 [Sqoop.groovy] Mkdir /user/biadmin/sqoop: 200 [Sqoop.groovy] Submitted job: job_1476266127941_0692 [Sqoop.groovy] Polling up to 60s for job completion... ............................................ [Sqoop.groovy] Job status: true [exit, stderr, stdout] [Sqoop.groovy] Content of stderr: log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found. 16/11/03 16:53:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6_IBM_27 16/11/03 16:53:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/11/03 16:53:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 16/11/03 16:53:06 INFO manager.SqlManager: Using default fetchSize of 1000 16/11/03 16:53:06 INFO tool.CodeGenTool: Beginning code generation SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/iop/4.2.0.0/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/iop/4.2.0.0/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/11/03 16:53:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0 16/11/03 16:53:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0 16/11/03 16:53:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/iop/4.2.0.0/hadoop-mapreduce Note: /tmp/sqoop-biadmin/compile/4432005ab10742f26cc82d5438497cae/scBlastTab.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/11/03 16:53:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-biadmin/compile/4432005ab10742f26cc82d5438497cae/scBlastTab.jar 16/11/03 16:53:09 INFO mapreduce.ImportJobBase: Beginning import of scBlastTab 16/11/03 16:53:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 16/11/03 16:53:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM scBlastTab AS t WHERE 1=0 16/11/03 16:53:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 16/11/03 16:53:10 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 16/11/03 16:53:15 INFO db.DBInputFormat: Using read commited transaction isolation 16/11/03 16:53:15 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(query), MAX(query) FROM scBlastTab 16/11/03 16:53:16 WARN db.TextSplitter: Generating splits for a textual index column. 16/11/03 16:53:16 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records. 16/11/03 16:53:16 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column. 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: number of splits:5 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476266127941_0693 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: mapreduce.job, Service: job_1476266127941_0692, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6fbb4061) 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ehaascluster, Ident: (HDFS_DELEGATION_TOKEN token 4660 for biadmin) 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: RM_DELEGATION_TOKEN, Service: 172.16.222.2:8032,172.16.222.3:8032, Ident: (owner=biadmin, renewer=mr token, realUser=HTTP/bicloud-fyre-physical-17-master-3.fyre.ibm.com@IBM.COM, issueDate=1478191971063, maxDate=1478796771063, sequenceNumber=67, masterKeyId=66) 16/11/03 16:53:16 WARN token.Token: Cannot find class for token kind kms-dt 16/11/03 16:53:16 WARN token.Token: Cannot find class for token kind kms-dt Kind: kms-dt, Service: 172.16.222.1:16000, Ident: 00 07 62 69 61 64 6d 69 6e 04 79 61 72 6e 05 68 62 61 73 65 8a 01 58 2b 1b 7b 34 8a 01 58 4f 27 ff 34 8e 03 a4 09 16/11/03 16:53:16 INFO mapreduce.JobSubmitter: Kind: MR_DELEGATION_TOKEN, Service: 172.16.222.3:10020, Ident: (owner=biadmin, renewer=yarn, realUser=HTTP/bicloud-fyre-physical-17-master-3.fyre.ibm.com@IBM.COM, issueDate=1478191972979, maxDate=1478796772979, sequenceNumber=52, masterKeyId=49) 16/11/03 16:53:17 INFO impl.YarnClientImpl: Submitted application application_1476266127941_0693 16/11/03 16:53:17 INFO mapreduce.Job: The url to track the job: http://bicloud-fyre-physical-17-master-2.fyre.ibm.com:8088/proxy/application_1476266127941_0693/ 16/11/03 16:53:17 INFO mapreduce.Job: Running job: job_1476266127941_0693 16/11/03 16:53:24 INFO mapreduce.Job: Job job_1476266127941_0693 running in uber mode : false 16/11/03 16:53:24 INFO mapreduce.Job: map 0% reduce 0% 16/11/03 16:53:32 INFO mapreduce.Job: map 20% reduce 0% 16/11/03 16:53:33 INFO mapreduce.Job: map 100% reduce 0% 16/11/03 16:53:34 INFO mapreduce.Job: Job job_1476266127941_0693 completed successfully 16/11/03 16:53:34 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=799000 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=644 HDFS: Number of bytes written=148247 HDFS: Number of read operations=20 HDFS: Number of large read operations=0 HDFS: Number of write operations=10 Job Counters Launched map tasks=5 Other local map tasks=5 Total time spent by all maps in occupied slots (ms)=62016 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=31008 Total vcore-milliseconds taken by all map tasks=31008 Total megabyte-milliseconds taken by all map tasks=190513152 Map-Reduce Framework Map input records=2379 Map output records=2379 Input split bytes=644 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=249 CPU time spent (ms)=6590 Physical memory (bytes) snapshot=1758576640 Virtual memory (bytes) snapshot=35233165312 Total committed heap usage (bytes)=2638741504 Fit Format Counters Bytes Read=0 File Output Format Counters Bytes Written=148247 16/11/03 16:53:34 INFO mapreduce.ImportJobBase: Transferred 144.7725 KB in 23.9493 seconds (6.0449 KB/sec) 16/11/03 16:53:34 INFO mapreduce.ImportJobBase: Retrieved 2379 records. [_SUCCESS, part-m-00000, part-m-00001, part-m-00002, part-m-00003, part-m-00004] BUILD SUCCESSFUL Total time: 1 mins 2.202 secs
From output above you can see the job output as well as the content of the table directory on HDFS which contains 5 parts (used 5 map tasks). WebHcat (Templeton) job console output will go to stderr in this case.
As part of compiling/running your code ensure you have the following dependency: org.apache.knox:gateway-shell:0.10.0.