This document serves as a guide for the public facing Sqoop Repository API as of 1.99.5 release
This API can evolve in future releases and hence it is relevant to the state of the API in 1.99.5
Background
Sqoop2 supports a persistent store for the sqoop entities such as the Configurables ( Connector and Driver) , Configs exposed by the Connectors, Jobs and Jobruns etc. The persistent store is commonly referred to as the repository. We also expose Rest APIs and shell commands to perform CRUD operations on the sqoop entities such as connectors and drivers, connector configs related to link and job information, sqoop job and its configs. Thus the persistent store comes handy in keeping a history of the sqoop entity objects created and updated over time. In order to access the persistent store with ease, we also expose a simple java based repository API that different data stores can implement to store the sqoop entity objects.
At this point, we support relational data stores, since the entities are related to each other and expressing these relations becomes easier with a relational data store. In future it is possible to add a non-relational store to implement the repository API. Repository structure ( schema and its fields ) has also changed over time during the sqoop releases so have the APIs to retrofit to the new structures.
The rest of the document will focus on the main public facing entities and repository APIs
Sqoop Entities
Represents the sqoop connector's link information. Link encapsulates the details required to connect to the the data source the connector represents. It has one associated config MLinkConfig
Entity | Model | Relationship | Description |
---|---|---|---|
CONFIGURABLE | Configurable.java ( abstract class) | Top Level Entity | Represents a core entity that exposes config objects and used in sqoop job lifecycle. Configurable have a associated version that acts as a identifier for connector config upgrades. |
CONNECTOR | MConnector.java |
| is a type of configurable There can be many connectors registered to the sqoop server |
DRIVER | MDriver.java |
| is a type of configurable There is only one Driver object representing sqoop in the system |
CONFIG | MConfig.java and @Config annotation | Top Level Entity | MConfigType are the supported config types as of 1.99.5 |
INPUT |
Concrete classes for each supported types
|
| Represents the key-value pairs for a given config. |
LINK |
MLinkConfig.java |
| Represents the config inputs required to physically connect to the data-source a connector represents. Hence it is associated with a connector. It has mainly one config object represented by MLinkConfig |
JOB |
|
| Represents the sqoop job. It encapsulates all the required configs to run the sqoop job. Primarily the sqoop job has the 3 main components, the FROM, TO and the DRIVER. FROM and its related MFromConfig represent the config-inputs-values required to Extract data from the source TO and its related MToConfig represent the config-inputs-values required to load data to the destination DRIVER and its related MDriverConfig the config-inputs-values required by the execution engine that runs the sqoop job optimally.
|
SUBMISSION | MSubmission.java | Represents the job run details. Includes the job status, job counters and metrics from the job execution engine |