Writing an Authority Connector

An authority connector to a repository allows a repository's security model to be enforced by a search engine. Its only function is to convert a user name (which is often a Kerberos principal name) into a set of access tokens.

The definition of an access token within LCF for a given repository is completely defined by the connectors that deal with that repository, with one exception. That exception is for Active Directory. Active Directory is so prevalent as a repository authorization mechanism that LCF currently treats it as the "default" authority - that is, if you don't specify another authority when you define a repository connection, LCF presumes that you mean that Active Directory should be the controlling authority for the connection. In that case, an access token is simply an Active Directory SID.

For those repositories that do not use Active Directory as their authorization mechanism, an authority connector should be written, along with the repository connector for the repository. Access tokens in that case represent a contract between your implementation of the authority connector for the repository, and the repository connector for the repository. They must work together to define access tokens that will limit document access when used properly within any search engine query.

As is the case with all connectors under the LCF umbrella, an authority connector consists of two parts:

A class implementing an interface (in this case, org.apache.lcf.authorities.interfaces.IAuthorityConnector)
A set of JSP's that implement the crawler UI for the connector

Key concepts

The authority connector abstraction makes use of, or introduces, the following concepts:

Concept	What it is
Configuration parameters	A hierarchical structure, internally represented as an XML document, which describes a specific configuration of a specific authority connector, i.e. how the connector should do its job; see org.apache.lcf.core.interfaces.ConfigParams
Authority connection	An authority connector instance that has been furnished with configuration data
User name	The name of a user, which is often a Kerberos principal name, e.g. john@apache.org
Access token	An arbitrary string, which is only meaningful within the context of a specific authority connector, that describes a quantum of authorization
Connection management/threading/pooling model	How an individual authority connector class instance is managed and used
Service interruption	A specific kind of exception that signals LCF that the output repository is unavailable, and gives a best estimate of when it might become available again; see org.apache.lcf.agents.interfaces.ServiceInterruption

Implementing the Authority Connector class

A very good place to start is to read the javadoc for the authority connector interface. You will note that the javadoc describes the usage and pooling model for a connector class pretty thoroughly. It is very important to understand the model thoroughly in order to write reliable connectors! Use of static variables, for one thing, must be done in a very careful way, to avoid issues that would be hard to detect with a cursory test.

The second thing to do is to examine some of the provided authority connector implementations. The Documentum connector, the LiveLink connector, the Memex connector, and the Meridio connector all include authority connectors which demonstrate (to some degree) the sorts of techniques you will need for an effective implementation. You will also note that all of these connectors extend a framework-provided authority connector base class, found at org.apache.lcf.authorities.authorities.BaseAuthorityConnector. This base class furnishes some basic bookkeeping logic for managing the connector pool, as well as default implementations of some of the less typical functionality a connector may have. For example, connectors are allowed to have database tables of their own, which are instantiated when the connector is registered, and are torn down when the connector is removed. This is, however, not very typical, and the base implementation reflects that.

Principle methods

The principle methods an implementer should be concerned with for creating an authority connector are the following:

Method	What it should do
getAuthorizationResponse()	Obtain the authorization response, given a user name

This method returns an AuthorizationResponse object, which can describe a number of conditions:

Condition	Meaning
RESPONSE_OK	The access tokens for the user were successfully obtained from the repository, and are being returned
RESPONSE_UNREACHABLE	The repository is currently unreachable, and appropriate disabling tokens are being returned
RESPONSE_USERNOTFOUND	The user was not found within the repository, and appropriate disabling tokens are being returned
RESPONSE_USERUNAUTHORIZED	The user was found, but was in some way disabled, and appropriate disabling tokens are being returned

In all cases, the connector returns access tokens. But in the case where token lookup has failed in some way, it is the responsibility of the connector to insure that inappropriate content is not viewed. Usually, this is done by ingesting a "global deny" token attached to all documents from the given repository, and then having the associated authority connector return this global deny token when error conditions apply.

Implementing a set of Authority Connector JSPs

The authority connector class you write provides, through one of its methods, a symbolic name where the crawler UI will look for authority connector UI components. Your components will therefore have the following path, relative to the crawler UI web application:

authorities/<connector_symbolic_name>

For an authority connector, you need to furnish the following JSPs:

JSP name	Where it fits
headerconfig.jsp	Called during the header section of authority connector configuration editing page
editconfig.jsp	Called during the body section of the authority connector configuration editing page
postconfig.jsp	Called when configuration editing page is posted, either on a repost or on a save
viewconfig.jsp	Called when the connection configuration is being viewed

As you might be able to tell, these "config" elements are responsible for editing and viewing a ConfigParam object, which describes the configuration used for a specific authority connection.

The crawler UI uses a tabbed layout structure, and thus each of these elements must properly implement the tabbed model. This means that the "header" elements above must add the desired tab names to a specified array, and the "edit" elements must provide JSP code that handles both the case where a tab is displayed, and where it is not displayed. Also, it makes sense to use the appropriate css definitions, so that the connector JSPs have a similar look-and-feel to the rest of LCF's crawler ui. We strongly suggest starting with one of the supplied authority connector's UI code, both for a description of the arguments to each page, and for some decent ideas of ways to organize your connector's UI code.

Implementation support provided by the framework

LCF's framework provides a number of helpful services designed to make the creation of a connector easier. These services are summarized below. (This is not an exhaustive list, by any means.)

Lock management and synchronization (see org.apache.lcf.core.interfaces.LockManagerFactory)
Cache management (see org.apache.lcf.core.interfaces.CacheManagerFactory)
Local keystore management (see org.apache.lcf.core.KeystoreManagerFactory)
Database management (see org.apache.lcf.core.DBInterfaceFactory)

For JSP UI component support, these too are very useful:

Multipart form processing (see org.apache.lcf.ui.multipart.MultipartWrapper)
HTML encoding (see org.apache.lcf.ui.util.Encoder)
HTML formatting (see org.apache.lcf.ui.util.Formatter)

DO's and DON'T DO's

It's always a good idea to make use of an existing infrastructure component, if it's meant for that purpose, rather than inventing your own. There are, however, some limitations we recommend you adhere to.

DO make use of infrastructure components described in the section above
DON'T make use of infrastructure components that aren't mentioned, without checking first
NEVER write connector code that directly uses framework database tables, other than the ones installed and managed by your connector

If you are tempted to violate these rules, it may well mean you don't understand something important. At the very least, we'd like to know why. Send email to connectors-dev@incubator.apache.org with a description of your problem and how you are tempted to solve it.

Child pages

How to Write an Authority Connector