ATLAS-1095 - Getting issue details... STATUS


Overview

The Open Connector Framework (OCF) governs access to data assets and metadata through standard mechanisms whilst preserving (where possible) the native asset access APIs.  It provides interfaces and factories for named connectors (called OCF Connectors) that access distributed data resources.   These data resources may be data stores (databases, files etcetera) or APIs to application data, transformations and analytical functions.

Accessing data through an APIs requires knowledge of the network address of the API for the data store, plus additional parameters such as userId and password.  These details are normally hard-coded in the calling application code which creates both a maintenance and security issue.  The OCF acts as a secure factory for connectors to data stores.  The application supplies the name of the connection it needs and assuming it is authorized, the OCF returns the connector.

Applications and tools benefit from using OCF connectors because:

  • Network and security parameters for accessing the data resources are managed in the open metadata repository as part of a named connection.  The application need only supply the name of the connection and provided they have the appropriate security credentials then a connector is returned to them for use. 
    • There is no need to hard code user ids and passwords in the application code - nor manage keystores for this sensitive information since the open metadata and governance server handles this.
    • If the location of the data changes, then the named connection configuration is changed in the open metadata repository and the application will be connected to the new location the next time they request a connector.
  • The OCF connector provides two sets of APIs.  The first set provides access to the data resource and the second set provides access to the metadata that the open metadata repository has about the data resource.  This provides applications and tools with a simple mechanism to make use of metadata as they process a data resource.  This is particularly useful for data science tools where the metadata can help guide the end user in the use of the data resource.
  • OCF connectors are not limited to representing data resources as they are physically implemented.   An OCF connector can represent a simplified logical (virtual) data resource that is designed for the needs of a specific application or tool.  This type of connector delegates the requests it receives to to one or more physical data resources.  The virtual data connector is an example of this type of connector.

Organizations benefit from advocating the use of OCF connectors for their systems because the OCF connectors provide a consistent approach to governance enforcement and audit logging.  This is particularly important in data-rich environments where individuals are able to combine data from different resources creating new potentially sensitive insight.  The common approach to auditing, and the linkage between the data accessed and the metadata that describes its characteristics.

 


Design Rationale

The following factors influenced the design of the OCF.

  • There are many existing connectors and connector framework in the industry today.  It is important that these connectors can be incorporated into the OCF.  Thus the OCF includes interface definitions that can be used as adapters to external connector providers and connectors.
  • Application developers will only adopt a connector framework if it is easy to use.  Thus the connector interfaces allow for the use of native data APIs to minimize the effort an application developer has to take in order to use the OCF connectors.
  • Governance enforcement is a complex topic, typically managed externally to the application development team.   As a result, a separate framework called the Governance Action Framework (GAF) manages governance enforcement and capabilities such as audit logging.  The role of the OCF is to bridge from the data resource access requests to the GAF.
  • Access to the metadata about a connector and its associated data resource should benefit from the breadth of metadata about the data resource in the open metadata repositories.  Thus there are is an Open Metadata Access Service (OMAS) called Connected Asset OMAS that integrates with OCF and provides metadata to all connectors.


Key Concepts

Connection

The connection is a metadata entity that defines the set of parameters needed to access a specific data resource.  Each connection has a unique name.  An application can request a connector instance from the OCF using the name of a connection.  (See model

Connector Instance

A connector instance is a java object that implements the Connector API.   It provides access to a data resource, along with its related metadata stored in Apache Atlas.  The connector instance is responsible for calling the governance action framework when it is initialized and before and after every access request to the data resource.

Connector Broker

The connector broker is the top-level factory for the OCF.   It returns a connector instance when passed the name of a connection (security permitting).  The connector broker is a local java object that can be used for many requests.  It returns connector instances that can be used for multiple requests to the data resource.

Connector Directory

A connector directory provides a list of related connections.  Connections can belong to multiple connector directories and are not deleted when a connector directory they are linked to is deleted.  A tool may create a connector directory to manage the list of connections they are using.  Administrators set up connector directories to group related connections together for different groups of users.  The connector directories are managed in Apache Atlas through the Connector Directory OMAS.

Connector Provider

A connector provider is the factory for a particular type of connector.  The connection stored in Apache Atlas will identify the connector provider. 

When the connector broker receives a request for a connector instance, it looks up the connection in Apache Atlas using the supplied name and extracts the connector provider information along with the parameters it needs to create the connector from the connection.  These parameters are passed to the connector provider, which returns the connector instance.  The connector broker then returns the connector instance to the requesting application.

The connector provider is responsible for the overall management of the physical data resources that is it using. It may therefore implement capabilities such as connector pooling and limit the number of active connectors to the data resource if appropriate.

The OCF uses this double layer of factories (connector broker and connector provider) for connectors to allow existing connector frameworks to be plugged into OCF.  This is done using a connector provider adapter.

Connector Provider Adapter

The connector provider adapter implements the OCF interface for a connector provider and delegates the calls it receives to the factory for an existing connector framework.

The connector returned by the existing connector framework may be inserted into a connector instance adapter, which is then returned the connector broker.

Connector Instance Adapter

The connector instance adapter provides a wrapper for a connector instance from another connector framework.  It provides an OCF compliant API for the connector instance and manages the connector instance’s metadata API along with calls to the governance action framework.

 

 


Inside the open connector framework

Figure 1 below shows the interaction of the core objects of the open connector framework.  The numbers on the diagram for Figure 1 show the order of execution.  The notes below describe this interaction.

 


 

  1. An application requests a connector to the data store by calling the Connector Broker and passing the name of the connection.
  2. The connector broker looks up the the connection details in the Open Metadata Repository.
  3. The connection details identifies the Connector Provider and the parameters it needs to create a Connector
  4. A connector is a java object. It is returned to the application by the connector broker
  5. The application is able to access data, metadata and an audit log through the connector.
  6. The connector extracts data from the data store.
  7. The connector extracts metadata from the open metadata repository.  This is managed by the Connected Asset OMAS which is plugged into the OCF if the OCF is accessed through another OMAS such as the Asset Consumer OMAS.  Connected Asset OMAS knows which asset metadata to return because it is linked to the connection details in metadata repository (see model 0205 in Area 2 model).
Figure 1: Open Connector Framework - Overview of Operation

 

The OCF provides:

  • A Java implementation of the connector broker
  • Java APIs for a connector provider and connector instances
  • Java base classes for a connector provider  and connector instance
  • Java POJO implementations of the properties about a connected asset

Once the OCF is in place Apache Atlas will provide support for JDBC connectors (see Virtual Data Connector (VDC)).   Other vendors or open source projects may supply connector providers what are able to create connectors for different types of data assets.  

 


Scope and value of the OCF to Apache Atlas (and Open Metadata)

The OCF offers a simple but powerful mechanism to intercept requests to access data resources and inject metadata and governance into these requests.  It is designed to embrace existing connector frameworks and support new connector implementations that connect to new types of data resources.  This includes connectors to composite or virtual data resources that delegate to existing physical data resources.

The Virtual Data Connector (ATLAS-1689) currently in development is an example of an OCF connector that uses glossary metadata to create business friendly views over relational database tables.   We also have plans for a data set connector that enables collections of files to be treated as a single data resource.

In addition to simplifying, enriching and governing access to data, the OCF can be used inside Apache Atlas to manage calls between metadata repository instances and related data stores.

The Open Metadata Repository Service (OMRS) Connector API is a standard interface for a connector to a metadata repository.  We plan 4 implementations of this API:

  • Local Atlas OMRS Connector – this is the connector to a local Apache Atlas metadata repository.
  • OMRS REST Connector – this is a connector to a remote Apache Atlas repository (or any other metadata repository that supports the OMRS REST APIs).
  • IGC OMRS Connector – this is the connector for IBM’s Information Governance Catalog
  • Enterprise OMRS Connector – this connector can federate multiple metadata repositories by aggregating the results of calls to their OMRS connectors.

In the longer term, we will extend this approach to all system resources such as:

  • A TinkerPop connector for the graph database
  • A log connector for the exception, operational lineage, meters and audit logs
  • A keystore connector for the keystore

The value of this approach is that it becomes easy to support different types of data stores for Atlas, and may of the connectors developed for the Open Metadata and Governance reference implementation will be useful for other applications.

 


Summary of JIRAs for Connector Implementations

The design and implementation of these connector implementations is handled in different JIRAs as follows:

  • ATLAS-1689 – Provides the Virtual Data Connector function along with a Connector Provider Adapter and Connector Instance Adapter for JDBC.
  • ATLAS-1772 – Data Set Connector – provides a provider and connector implementation for collections of files.  These files may be structured or unstructured data.  The data set may be a set of files physically co-located (such as in the same folder on disk) or files that have similar characteristics in a larger collection.
  • ATLAS-1773 – OMRS REST Connector – provides the definition of the OMRS Connector API and an implementation of this API for a local Apache Atlas metadata repository.
  • ATLAS-1774 – IGC OMRS Connector – provides a connector for IBM’s Information Governance Catalog that implements the OMRS Connector API defined in JIRA ATLAS-1773.
  • ATLAS-1775 – Enterprise OMRS Connector – provides a connector that implements the OMRS Connector API defined in JIRA ATLAS-1773 that is able to aggregate the metadata from multiple metadata repositories in response to metadata requests.