This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KIP-1 LDAP Improvements
Skip to end of metadata
Go to start of metadata

Status

Current stateComplete

Discussion threadhere

JIRA

Motivation

While the existing Shiro authentication provider and KnoxLdapRealm implementations provide some powerful authentication capabilities against LDAP/AD servers, there are still a number of pain points associated with group lookup/search, nested OUs, large numbers of groups, etc.

There are also some inconsistencies with other services across the Apache Hadoop ecosystem. These inconsistencies include: configuration and runtime behavior. 

In addition, LDAP/AD configuration must be defined in all topologies that are using the same LDAP/AD instance. This redundancy introduces room for error and unnecessary admin burden to keep them correct or make changes.

This Knox Improvement Proposal will rationalize a number of Apache Knox JIRAs for LDAP improvements and hopefully represent the central theme for the 0.10.0 release.

Improvements

We have a number of open JIRAs around LDAP improvements. Some of these are complements of each other while others may be alternatives.

We will need to determine the best way forward to address the pain points specifically around: group lookup, large resultsets, performance.

These are mostly collected below under #2.

1. KNOX-237 Hadoop Group Mapping Identity Assertion Provider - DONE

In order to have consistent configuration parameters, capabilities and behavior, Apache Knox should integrate the Hadoop Groups Lookup component from Hadoop Common.

This is a pluggable mechanism with support for LDAP/AD, OS/Linux groups, etc.

We need to determine whether this integration will address the other pain points related to large numbers of groups and performance.

Improvements to core Hadoop would benefit the entire ecosystem and as such deserves investment where possible.

Configuration

The following is an example of the configuration for the a would-be HadoopGroupMapping provider:

<topology>
     <gateway>
          <provider>
           <role>identity-assertion</role>
           <name>HadoopGroupMapping</name>
          <param>
               <name>hadoop.security.group.mapping</name>
               <value>org.apache.hadoop.security.LdapGroupsMapping</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.bind.user</name>
               <value>cn=Manager,dc=hadoop,dc=apache,dc=org</value>
           </param>
           <!--property>
               <name>hadoop.security.group.mapping.ldap.bind.password.file</name>
               <value>/etc/hadoop/conf/ldap-conn-pass.txt</value>
          </property-->
          <param>
               <name>hadoop.security.group.mapping.ldap.bind.password</name>
               <value>hadoop</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.url</name>
               <value>ldap://localhost:389/dc=hadoop,dc=apache,dc=org</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.url</name>
               <value>ldap://localhost:389/dc=hadoop,dc=apache,dc=org</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.base</name>
               <value/>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.search.filter.user</name>
               <value>(&amp;(|(objectclass=person)(objectclass=applicationProcess))(cn={0}))</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.search.filter.group</name>
               <value>(objectclass=groupOfNames)</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.search.attr.member</name>
               <value>member</value>
          </param>
          <param>
               <name>hadoop.security.group.mapping.ldap.search.attr.group.name</name>
               <value>cn</value>
          </param>
     </provider>
 </gateway>
</topology>

2. Handling Large Group Results in Shiro Provider

KNOX-461 Leverage Directory Computed Attribute for User Group Discovery

We should use computed attribute memberof supported by Active Driectory to discover groups of the authenticated user. This would significantly boost performance as compared we computing groups using group search.

OpenLDAP also could be configured to return computed groups.
However, OpenLDAP would return this attribute as memberof.

KNOX-644 Limit/page results of LDAP group membership search - DONE

Some users are finding that they have >1000 groups that would be returned given how Knox currently implements group lookup. ActiveDirectory currently limits search results to 1000 items and this causes failures that require workarounds at the client side. Ideally Knox's LDAP group search implementation would either limit/filter the results or page the result set that are unavoidably large.

KNOX-536/KNOX-537 LDAP authentication against nested OU (PAM support) - DONE

Knox Gateway provides HTTP BASIC authentication against an LDAP user directory. It currently supports only a single Organizational Unit (OU) and does not support nested OUs.

OS level PAM security provides great interface for authentication and authorization. For example, sssd provides support for manage Active Directory nested OU by adjusting ldap_group_nesting_level = 5. Knox configuration is configured to interact with LDAP directly, but this has two short cominges. First, hgh volume traffic is likely to make too many queries to AD without cache. Second, complex logic of LDAP queries can not map correctly to UserDnTemplate without adding more ldap specific logic into JndiLdapRealm code and parameters.

Knox can be improved to use PAM to out source complex OS to AD interaction to sssd. It is possible to implement a shiro PAM plugin to reduce the complex LDAP logic that is starting to accumulate in Knox.

3. Centralized Gateway Configuration for LDAP/AD - DONE via Shared Provider Config

In order to eliminate redundancy in complicated configuration like LDAP/AD, we should consider centralizing such configuration into specialized topologies that can be imported into the gateway section of cluster topologies.

This would provide the ability to "import" gateway topologies by name into cluster topologies.

The following would be a gateway topology for basic demo config as demo.xml:

<topology>
    <gateway>
        <provider>
            <role>authentication</role>
            <name>ShiroProvider</name>
            <enabled>true</enabled>
            <param>
                <name>main.ldapRealm</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
            </param>
            <param>
                <name>main.ldapContextFactory</name>
                <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory</name>
                <value>$ldapContextFactory</value>
            </param>
            <param>
                <name>main.ldapRealm.userDnTemplate</name>
                <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.url</name>
                <value>ldap://localhost:33389</value>
            </param>
            <param>
                <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
                <value>simple</value>
            </param>
            <param>
                <name>urls./**</name>
                <value>authcBasic</value>
            </param>
        </provider>

        <provider>
            <role>identity-assertion</role>
            <name>Default</name>
            <enabled>true</enabled>
        </provider>

        <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>false</enabled>
        </provider>
    </gateway>
<topology/>

The following would then be able to import the above gateway topology:

<topology>

    <gateway>
        <import>demo.xml</import>
    </gateway>

    <service>
        <role>NAMENODE</role>
        <url>hdfs://localhost:8020</url>
    </service>

    <service>
        <role>JOBTRACKER</role>
        <url>rpc://localhost:8050</url>
    </service>

    <service>
        <role>WEBHDFS</role>
        <url>http://localhost:50070/webhdfs</url>
    </service>

    <service>
        <role>WEBHCAT</role>
        <url>http://localhost:50111/templeton</url>
    </service>

    <service>
        <role>OOZIE</role>
        <url>http://localhost:11000/oozie</url>
    </service>

    <service>
        <role>WEBHBASE</role>
        <url>http://localhost:60080</url>
    </service>

    <service>
        <role>HIVE</role>
        <url>http://localhost:10001/cliservice</url>
    </service>

    <service>
        <role>RESOURCEMANAGER</role>
        <url>http://localhost:8088/ws</url>
    </service>

</topology>

OPEN ISSUE: SHOULD GATEWAY TOPOLOGIES ONLY CONTAIN GATEWAY ELEMENTS TO IMPORT OR SHOULD WE WRAP THEM WITH TOPOLOGY ELEMENTS AS DESCRIBED ABOVE

Testing

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

  • No labels