Status

Current state: Under Discussion

Discussion thread: here

JIRA: here [TBD]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

This KIP only applies to StandardAuthorizer


Motivation

Kafka authorizes access to resources like topics, consumer groups etc. by way of ACLs. 

Currently, when adding new ACLs in Kafka, we have two types of resource patterns for topics and groups: (Documentation https://kafka.apache.org/documentation/#security_authz_cli look for --resource-pattern-type [pattern-type])

  • LITERAL
  • PREFIXED

If we create a GLOB pattern type to match the globular wildcard characters ('*", and "?") it would help organizations streamline their ACL management processes by reducing the  number of ACLs.

Example scenarios

Let's say we need to create ACLs for the following 6 topics:  nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, de-accounts-remotetopic, cz-accounts-localtopic, cz-accounts-remotetopic

Currently, we achieve this using existing functionality by creating three prefixed ACLs as shown below:

kafka-acls --bootstrap-server localhost:9092 \
    --add \
  --allow-principal User:CN=serviceaccount,OU=ServiceAccountUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown \
    --producer \
    --topic nl-accounts- \
    --resource-pattern-type prefixed

kafka-acls --bootstrap-server localhost:9092 \
    --add \
  --allow-principal User:CN=serviceaccount,OU=ServiceAccountUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown \
    --producer \
    --topic de-accounts- \
    --resource-pattern-type prefixed

kafka-acls --bootstrap-server localhost:9092 \
    --add \
  --allow-principal User:CN=serviceaccount,OU=ServiceAccountUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown \
    --producer \
    --topic cz-accounts- \
    --resource-pattern-type prefixed


However, if we supported a GLOB pattern where '?' matched a single character and "*" matches any number of characters, we could accomplish this with a single ACL, as illustrated here:

kafka-acls --bootstrap-server localhost:9092 \
    --add \
  --allow-principal User:CN=serviceaccount,OU=ServiceAccountUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown \
    --producer \
    --topic ??-accounts-* \
    --resource-pattern-type glob


The same applies to consumer groups as well as other ResourceTypes

kafka-acls --bootstrap-server localhost:9092 \
  --add \
  --allow-principal User:CN=serviceaccount,OU=ServiceAccountUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown \
  --producer \
  --group *-testgroup-* \
  --resource-pattern-type glob


Notes:

  • PREFIX patterns can be evaluated by the GLOB processor by adding an "*" to the end of the prefix pattern.
  • LITERAL patterns can be evaluated by the GLOB processor.
  • Deprecation of PREFIX and LITERAL may be undertaken but it is not part of this KIP.

ACL precedence : 

The ACL precedence does not change. 

  • Kafka evaluates both the PREFIX and LITERAL ACLs for the topic.
  • If either ACL is a deny ACL, it will take precedence and block access.
  • In the absence of a DENY ACL the most specific matching ACL will take precedence.

When GLOBS are in use the most specific (fewest skipped letters) pattern  is selected.

Glob ACL →

Search

HelloHelloWo*dHelloW*He*m*m
HelloX


HelloWorld
X

HelloWonderful
X

HelloWorldWide
X

HelloDollyX


Hermesmoon


X
Helmsman


X
HelloWiked

X
Helpme




Support for adding ACLs to such 'match resource patterns' will greatly simplify ACL operations.

Usability : 

With the ACL system becoming a complex web of patterns, it is incumbent upon the development team to provide tools to assist in permissions problem determination.

  • There should be a tool that will provide a list of all ACLs that impact the decision to allow or deny access for a principal to a topic based on principal ID, host, and operation.  This will assist operators in rapidly determining the reason for access denied errors.
  • There should be a tool to show the effects of adding an ACL.  Using the example from above adding *-accounts-*", should list that nl-accounts-localtopic, nl-accounts-remotetopic, de-accounts-localtopic, de-accounts-remotetopic, cz-accounts-localtopic, and cz-accounts-remotetopic are affected.
  • There should be a tool to show the effects of adding a topic.  Using the example from above adding *us-accounts-privatetopic", should list that "*-accounts-*" will influence the permissions calculations for the new topic.

I would like to propose the need of this kind of tooling in a different KIP.

Public Interfaces

  • Modification of the org.apache.kafka.common.resource.ResourceType class to add "GLOB" as a type.

  • Modification of the org.apache.kafka.common.resource.ResourcePatternFilter to properly filter for GLOB patterns.
  • Modification of the org.apache.kafka.common.acl.AccessControlEntryData to handle GLOB patterns.
  • Addition of GLOB matching utilities or a GLOB class that will match strings to GLOB patterns.  Potential source candidates include Plexus matching utility code.
  • Modification of the metadata/authorizer packages to account for GLOB patterns in resource name as well as principal and host strings
  • Change to the Kafka principal prohibiting the colon ":" as part of the type string.  This is currently implied but not enforced.
  • Addition of a static "parse(String)" method to the KafkaPrincipal to parse a principal string into a KafkaPrincipal object.  Everything up to the first colon ':' will be assigned as the principal type, everything after will be assigned to the principal name.  This will centralize the parsing and reduce the possibility of parsing errors in the codebase.
  • Modification of the org.apache.kafka.server.authorizer.Authorizer to reimplement or remove the authorizeByResourceType implementation to account for GLOB types.
  • Modification of the kafka.admin.AclCommand class to update multiple methods like getResourceFilter and objects for parsing arguments AclCommandOptions
  • Modification of the org.apache.kafka.jmh.acl.AuthorizerBenchmark class to update multiple methods like setup and prepareAclCache
  • Modification of org.apache.kafka.jmh.acl.StandardAuthorizerUpdateBenchmark class to update prepareAcls method
  • Modification of org.apache.kafka.metadata.authorizer.StandardAuthorizerData class to update authorize method
  • Modification of org.apache.kafka.controller.AclControlManager class to update validateNewAcl method

Proposal for Locating matches in a GLOB based Authorizer

Search requirements

  1. Matching DENY ACLs always override Matching ALLOW Acls.
  2. More specific string matching overrides less specific string matching.

Notes and Nomenclature

  1. This proposal is based on utilizing the Trie implementation found in KAFKA-17423.
  2. The Trie is based on the resource name, each node contains a set of ACLs that are associated with the name.
  3. The string we are searching for is called the TARGET
  4. The string we are comparing against is call the CANDIDATE.
  5. GLOB characters are "?" and "*".
    1. "?" matches a single character
    2. "*" matches zero or more characters.

Matching

Matching the resource name

It is important to remember that the patterns with GLOBS are the  CANDIDATE stored in the Trie and are not the TARGET being searched for.   

The Trie naturally split resource names where there is a distinction between two names (e.g. foobar and foocar will result in a "foo" node with two children "bar" and "car").  When inserting pattern with GLOB characters the insert algorithm will create child nodes that contain only the the GLOB character.

The Trie implementation starts at the root node which contains the empty string.  It then begins a recursive descent of the trie by executing the Descent Process on the root node.

Descent process

  • Is there a matching DENY ACL on this node?  If so return DENY.
  • is there a matching LITERAL ACL on this node?  if so return the result.
  • is there a child node that continues the TARGET pattern?  If so recurse in to the Descent process.
  • if this point is reached stop descent process and begin Ascent process.

Ascent process

  • Is this the root node? if so return NO MATCH.
  • Is there a matching GLOB ACL on this node? if so return the result.
  • Is there a "?" character child of this node?  if so execute the descent process on the "?" node. 
  • Is there a "*" character child of this node?  if so execute the descent process for each potential block of characters. (This handles the multiple character nature of the '*' wildcard)
  • if this point is reached move to the parent node and execute the Ascent process again.

The above process will match the resource names and will distribute them through a Trie so that the search will be much faster.

Matching Kafka Principal

We define an GlobPrincipal as

public class GlobPrincipal {
    final private KafkaPrincipal principal;
	final private Predicate<KafkaPrincipal> matcher;

    public GlobPrincipal(String pattern) {
        this(KafkaPrincipal.parse(pattern));
	}
    
	public GlobPrincipal(KafkaPrincipal principal) {
        this.principal = principal;
		this.matcher = globMatcher(principal);
    }

    private static boolean hasGlob(String s) {
        return StringUtils.isEmpty(s) || s.contains("*") || s.contains("?");
    }

    private static Predicate<KafkaPrincipal> globMatcher(KafkaPrincipal principal) {
     	if (hasGlob(principal.toString()))
		{
			Predicate<String> typePredicate;
			if (hasGlob(principal.getPrincipalType()) {
	 			MatchPattern typePattern = new MatchPattern(principal.getPrincipalType());     
				typePredicate = typePattern::matches;
            } else {
                typePredicate = s -> principal.getPrincipalType().equals(s);
            } 
 			Predicate<String> namePredicate;
			if (hasGlob(principal.getName()) {
	 			MatchPattern namePattern = new MatchPattern(principal.getName());     
				namePredicate = namePattern::matches;
            } else {
                namePredicate = s -> principal.getName().equals(s);
            }
			return other -> typePredicate.test(other.getPrincipalType()) && namePredicate.test(other.getName());
 			
        } else {
            return other -> toString().equals(other.toString());
        }
    }

    @Override
    final public int hash() {
	    return principal.hash();    
    }


    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null) return false;
        if (getClass() != o.getClass()) return false;

        GlobPrincipal that = (GlobPrincipal) o;
		return this.principal.equals(that.principal);
    }

    @Override
    public String getName() {
        return principal.getName();
    }

    public KafkaPrincipal getPrincipal() {
        return principal;
    }


    public matches(String name) {
        return matches(KafkaPrincipal.parse(name));
    }
 
    public matches(KafkaPrincipal other) {
        return matcher.test(other);
    }


When stored in the ACL list associated with in the Trie the principal will be stored as a GlobPrincipal.  This allows the ACLs on the nodes to be in a sorted order for faster traversal and the matches will include the wildcard matching.

For other uses it performs GLOB character detection and will perform proper matching.  This class will need to be used within the Client and Server code to perform matching.  An additional "GlobPrincipals" class may be created to store a collection of glob principals and determine if any of the contained GlobPrincipal instances match a principal or string.

Matching Host

We define an GlobHost as

public class GlobHost {
    final private String pattern;
	final private Predicate<String> matcher;

    public GlobHost(String pattern) {
        this.pattern = pattern;
        this.matcher = globMatcher(pattern);
	}

    private static boolean hasGlob(String s) {
        return StringUtils.isEmpty(s) || s.contains("*") || s.contains("?");
    }

    private static Predicate<String> globMatcher(String pattern) {
     	if (hasGlob(pattern))
		{			
	 	    MatchPattern matchPattern = new MatchPattern(pattern);     
			return matchPattern::matches;
        } else
            return pattern::equals
        }
    }

    @Override
    final public int hash() {
	    return pattern.hash();    
    }


    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null) return false;
        if (getClass() != o.getClass()) return false;

        GlobHost that = (GlobHost) o;
		return this.pattern.equals(that.pattern);
    }

    @Override
    public String toString() {
        return pattern;
    }

    public matches(String other) {
        return matcher.test(other);
    }


When stored in the ACL list associated with in the Trie the principal will be stored as a GlobPrincipal.  This allows the ACLs on the nodes to be in a sorted order for faster traversal and the matches will include the wildcard matching.

For other uses it performs GLOB character detection and will perform proper matching.  This class will need to be used within the Client and Server code to perform matching.  An additional "GlobHosts" class may be created to store a collection of glob principals and determine if any of the contained GlobHost instances match a principal or string.

Pattern Types

A LITERAL match matches all the characters without wildcard expansion.

Trie stored→

Search

HelloHelloWorldH*World
HelloTFF
HelloWorldFTF
HappyWorldFFF

A PREFIX match matches all the characters with wildcard expansion

Trie stored→

Search

HelloHelloWorldH*World
HelloTFF
HelloWorldTTF
HappyWorldFFF

A GLOB match only matches GLOB and PREFIX labeled ACLs in the Trie, all LITERAL matches are ignored.

Trie stored→

Search

HelloHelloWorldH*World
HelloTFF
HelloWorldFTT*
HappyWorldFFT

 * Search algorithm would return the LITERAL "HelloWorld" match before the wildcard match was found.

The GLOB pattern can replace both the LITERAL and the PREFIX types as the following rewrite table demonstrates

Pattern TypePattern

Equivalent GLOB  pattern

Literal

SomeName

SomeName
PrefixSomeNameSomeName*
Literal*

(empty string)

The current Trie implementation of the Literal "*" is to place the StandardACLs on the root node so they are located before any node that has characters.

Proposed Changes

Main changes include :

  • Updating Authorizer
  • AdminClient changes
  • Updating cli

Detailed changes also include 

  • Modification of the org.apache.kafka.server.authorizer.Authorizer to push the authorizeByResourceType down into the AuthorizerData interface.  This allows for authorizers that accept or do not accept wildcards.
  • Modification of the kafka.security.authorizer.AclAuthorizer class to
    • update authorizeByResourceType method and other methods
    • update matchingAcls (this is performance sensitive, as it impacts latency of every producer and consumer client to get authorization. Verify AuthorizerBenchmark)
  • Modification of the kafka.admin.AclCommand class to update multiple methods like getResourceFilter and objects for parsing arguments AclCommandOptions
  • Modification of kafka.server.AuthHelper class to update authorize method
  • Modification of the org.apache.kafka.jmh.acl.AuthorizerBenchmark class to update multiple methods like setup and prepareAclCache
  • Modification of org.apache.kafka.jmh.acl.StandardAuthorizerUpdateBenchmark class to update prepareAcls method
  • Modification of org.apache.kafka.metadata.authorizer.StandardAuthorizerData class to update authorize method
  • Modification of org.apache.kafka.controller.AclControlManager class to update validateNewAcl method
  • Updating tests

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
    • Existing ACLs will continue to work as expected.  Addition of GLOB allows for new formats for resource names, host names, and Kafka principals
  • If we are changing behavior how will we phase out the older behavior?
    • This question is not applicable, as it introduces a new functionality to authorize. Old behavior will still continue to exist.  Deprecation may occur at a later time.
  • If we need special migration tools, describe them here.
    • No, not required.
  • When will we remove the existing behavior?
    • It is not required to remove any existing behavior

Test Plan


Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

  • Create a bunch of topics with similar prefixes and suffixes and more with these patterns.
  • Create an ACL with LITERAL on one topic, and verify authorization on all other topics. Shouldn't be allowed.
  • Create an ACL with PREFIXED pattern type, and verify the created ACL (both LITERAL and PREFIXED) work as expected
  • As performance is the key, its very important to test with large set of ACLs in multiple combinations concurrently and validate against the defined bench marks
  • Existing system test will continue to function as they do today.  
  • We will add additional system tests to show that the GLOB implementation handles the wildcards correctly.


Trie vs KRAFT Standard Search times

The data is available in KAFKA-17423 - Getting issue details... STATUS

However, the testing indicates that the Trie search times are at least an order of magnitude faster than the existing system.

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • Rejected using the technique from bioinformatics called sequence characterization as it was too slow for the hot path.  A poster describing this technique is available.
  •  Modifying the existing PREFIXED pattern to include internal GLOB characters.  This break the Client code with no easy way forward.


  • No labels