Topology Policy Separation

Introduction

Since the very beginning of the Knox design though the current release of 0.4.0, the topology file used to deploy cluster topologies as consisted of both policy enforcement "provider" definitions and service definitions.

There are a couple problems with this approach.

  1. expected sources of topology information will not contain the information and configuration required for policy enforcement or provider selection
  2. the configuration of the providers within each topology are often redundant and can present a management issue when changes are required to deployed topologies
  3. the topology file ends up much more like a configuration file than a deployment descriptor

As Apache Knox matures it needs to start providing management capabilities consisting of:

  • Management APIs
  • Console Applications/UIs and/or Ambari Views
  • Centralized Policy Management
  • Topology Discovery through Ambari, ZooKeeper or other registries

This document will discuss policy management details and how to separate policy from topology information and organize it within a policy store.

High-level Reusable Policy Files

Let's start with a highly readable policy file syntax that encompasses all of the pertinent semantics without requiring the low level details for enforcement.

JSON presents a good choice for this as it is very readable yet structured.

The following non-normative example demonstrates the:

  • removal of the notion of "role" - role becomes the policy type
  • removal of the notion of "enabled" - inclusion implies enabled
  • removal of the low-level config details
  • reference to the needed details
  • ability to compose reusable policies with reusable config

Let's consider this the default topology policy:

default-policy.json
{
    "authentication" {
        "name" : "shiro",
        "config" : "basic-ldap-1"
    },
    "identity-assertion" {
        "name" : "kerberos",
        "config" : "kdc-1"
    },
    "authorization" {
        "name" : "AclsAuthz",
        "config" : "default-authz"
    },
    "host-mapping" {
        "name" : "hostmap",
        "config" : "sandbox"
    }
}

Low-level Reusable Configuration Files

The low-level details of ldap bind, search-bind and acceptance of HTTP BASIC authentication are details that are required by the provider enforcing the declared policy and do not need to be seen or even understood by the topology policy author. All they need to know is which configuration to use for HTTP BASIC against a particular LDAP server instance.

The way that we abstract these details away from policy authors is by managing them as separate but capable of being referenced by the policy by name.

The shiro configuration for BASIC authentication against LDAP with a simple bind is an example of one such config file.

basic-ldap-1.xml
<policy-config>
<param>
    <!--
    session timeout in minutes, this is really idle timeout,
    defaults to 30mins, if the property value is not defined,,
    current client authentication would expire if client idles contiuosly for more than this value
    -->
    <name>sessionTimeout</name>
    <value>30</value>
</param>
<param>
    <name>main.ldapRealm</name>
    <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value>
</param>
<param>
    <name>main.ldapRealm.userDnTemplate</name>
    <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
</param>
<param>
    <name>main.ldapRealm.contextFactory.url</name>
    <value>ldap://localhost:33389</value>
</param>
<param>
    <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
    <value>simple</value>
</param>
<param>
    <name>urls./**</name>
    <value>authcBasic</value>
</param>
</policy-config>

Policy Store Structure

The policy store can be simply structured as files within directories along the following lines:

conf/
    topologies/
    policies/
        default-policy.json
        super-secure-policy.json
        config/
            authentication/
                basic-ldap-1.xml
                basic-ldap-group-lookup-1.xml
                basic-AD-1.xml
                client-cert-ldap-1.xml
            identity-assertion/
            authorization/
            host-mapping/

The above illustrates the basic structure and ability to locate referenced configuration details from within a topology policy file.

Deployment Machinery Changes

In order to break apart the policy and service definitions, Knox will need to be able to bring them together at deployment time.

We can do a couple things here:

  1. introduce a new topology file that ends with topo or something other than xml
    1. upon discovering a new topo file the deployment machinery will resolve a referenced high level policy file or in the absence of a reference use "default-topology.json" as the policy
    2. it will then combine the two into the currently expected .xml file with providers and services in a single file
    3. everything will just work as it does now upon discovery of the .xml file
  2. change the topology parsing rules to have to dereference the referenced or implied policy and configurations files
    1. much more complex and adds obvious risk
    2. would essentially go right from a service definition file to gateway.xml as the enforceable policy

Central Management

The ability to centrally manage these policy files for a cluster of Knox instances will require the use of ZooKeeper or some other synchronization across the instances.

Of course we could consider the use of an NFS mount or some other mechanism as well.

Change Uptake

Changes to the policy or configuration files that are being used by deployed topologies will require redeployment of the topology file.

Keeping an index of those topologies that are using which would allow it to be automated when changes are made through the management APIs.

Manually changing the policy or configuration files will require manual restart of the topologies that are using them.

 

  • No labels

1 Comment

  1. In my mind the most important unit of reuse is what you are suggesting go in conf/policies/config/{role}/{name}.xml.  I struggle to "name" these things be cause of the overloaded nature of both policy and config terms.  Mentally I think of them as Provider specific policy.  At any rate the forced use of XML as the "provider policy" (my lingo) even in the current topology model felt unnatural.  In my mind it unnecessarily complicates the Provider's deployment contributor code having to translate generic XML into something provider specific.  Wouldn't it be better to simply allow the Provider to copy the "provider policy" into the WAR structure during deployment vs having to translate.  Case in point for me is if we ever need host mapping for 10,000 nodes encoding that in <param><name/><value/></param> never seemed like a good idea to me.