Title : Sqoop Config Input as a Top Level Entity
The current proposal enhances existing functionality ( command line and rest apis) to support RU ( Read and Update) operations on the config input objects independently.
Configs are exposed in code via the Connectors and Drivers ( the two CONFIGURABLES supported). They annotate the config classes with "@Config" annotation and that is how sqoop registers these entities into the repository during the server startup. If a connector exists in the sqoop repository (during server-start up or while invoking UpgradeTool) then the connector's upgrade API is invoked to update the attributes of the config object.
The current SQ_CONFIG stores the top level config entries per configurable.
Currently we support 2 types of configs. LINK and JOB configs. The MConfigType Enum encapsulates this information. It is the value used in "SQ_CFG_TYPE" when a config is registered.
Each class annotated with @Config, exposes a list of inputs via the "@Input" annotation and its attributes. The @Input annotated fields are stored in another table SQ_INPUT along with supported attribute and their values. The SQ_INPUT only stores the input keys and the attribute values. The actual value for the SQ_INPUT are dependent on per JOB and per LINK ( Refer to this wiki to understand sqoop entities ) and hence there are 2 additional tables where we store the SQ_JOB_INPUT and SQ_LINK_INPUT.
The following table lists the type and number in (*) of configs exposed by each of the configurables. Each config object is represented as a list. Hence a connector can expose a FROM-CONFIG with more than one config object in it.
Configuration : It refers to a class in sqoop annotated with @ConfigurationClass and represents a grouping of related configs that the corresponding configurable that owns this configuration class exposes
Here is the code example.
Config or ConfigInputs : They both mean the same in this wiki and are used interchangeably, i.e they represent a class in sqoop that are annotated with the @ConfigClass annotation and used in the @ConfigurationClass annotated class with @Config annotation
Inputs : They are represented by the @Input annotation on a variable inside a class annotated with @ConfigClass.
- Configs should be treated as a top level entity as they become more sophisticated with the SQOOP-1804 and in future with the implementation of SQOOP-1643. Currently editing and reading config/ config inputs can only happen via a job or link. This proposal aims at making the config objects as first class citizens, so they can be read and edited by their unique name.
- Config objects per
MConfigType(i,e JOB and LINK) are lists. So ability to edit per "CONFIG" name is easier than having to edit per "CONFIG LIST" associated with the type. Users do see the names when they list the configs per connector, we can also have a shell command/ rest-API to list all the configs per configurable to make it easier for users to reference configs by name
- Read the Config Inputs by Type/SubType and By Job /Submission ( since SQOOP-2025 we may be able to have configs by submissionId)
- Update the Config Inputs by Type/SubType for the latest/last submission in the job. We should not allow editing previous submissions and it should be read only
- Only the "inputs" with attribute "USER-ONLY" or "ANY" as per SQOOP-1804 will be editable. Make sure to validate the condition of editable and adhere to cascading changes depending on the overrides attribute on each input.
- Once the input values are edited, the new values will be used in the next job run, unless we maintain history as per SQOOP-2025.
NOTE : Support all the above in both shell command and Rest-API.
- Supporting CD ( create / delete ) of Config Inputs via the REST or command line. It is only allowed via the configurable code and supported annotations on the classes today and it should remain so.
- Editing submission history
Design and Implementation Details
Here are some details I considered when coming up with the subType and why it made sense.
At one point, I thought why not have direction as a parameter for type, JOB, but direction is not relevant to all configurables. i,e if for the driver configs, "direction" has no meaning. Similarly for the type "LINK" there is no concept of direction.Hence I went with the subType, where subType is a second level hierarchy for distinguishing the types of configs that are supported in sqoopAlternatives are possible, but we have to bear in mind that config/ config inputs are not associated with jobs and links, They are associated with connectors/driver ( i,e configurables )The config input values are associated with jobs and links rather, so when reading/ editing the config input values, either we can use JOB/ LINK entities or rely more on the "CONFIGTYPE" enum, if we want to treat config as a first class citizen
SubType for MConfigType
SubTypes have been added to the
MConfigType Enum to indicate the sub types
NOTE: all the job config inputs values are for the last job run only since we do not store the config values for each submission yet
Edit Config By Type and Job
NOTE: Prev submissions cannot be edited, hence we restrict editing to the last job run only)
ALTERNATIVELY, It was suggested that for command line it is NOT good not to introduce config as a top level ( It was a preference from Gwen Shapira and Qian Xu). Hence the command are
Rest API changes
The resource returned from the APIs is a config object
Read Config By Type and Job or Submission
Edit Config By Type and Job
We chose #1 in both the cases
Repository API changes
Add new API to get config inputs by jobId and type, name
Add new API to edit/post config inputs by jobId and type, name
Unit test are almost non existent for the shell code. Hence we will rely on the basic manual testing
Rest APIs can be tested via integration tests and that will be part of the work proposed