- Motivation:
- current way of managing job config via files makes it hard to manage job configurations, persisting it in MySQL would make it easier to manage and also provide various CRUD API interfaces to mange the job configuration.
- Proposed Change:
- create MySQL table `gobblin_job_config`
- create REST endpoints for CRUD operations
- add functionality to detect change in any job config and schedule or trigger job accordingly ( similar to current mechanism based on file change )
- In future, Gobblin should use JDBI (v3) as ORM instead of plan JDBC based persistence which will become cumbersome as it uses DB heavily for all kinds of metadata storage.
- New or Changed Public Interfaces:
- POST /job/{jobName}
- PUT /job//{jobName}
- GET /job/{jobName}
- DELETE /job/{jobName}
- Migration Plan and Compatibility:
- since this is new functionality, in addition to the current file based job config, user can select either one to manage the job.
- Rejected Alternatives:
- use of other REST framework ( preferring to use REST.li instead of dropwizard or any other as its built in framework for other APIs
Design:
MySQL DDL: ( V1_0_4__job_store.sql )
gobblin_job_config
CREATE TABLE `gobblin_job` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `description` varchar(255) DEFAULT NULL, `schedule` varchar(64) DEFAULT NULL, `is_disabled` tinyint(1) DEFAULT '0', `priority` smallint(6) DEFAULT NULL, `configs` text, `owner_email` varchar(255) DEFAULT NULL, `source_system` varchar(255) DEFAULT NULL, `target_system` varchar(255) DEFAULT NULL, `job_template` varchar(255) DEFAULT NULL, `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`name`), KEY `name` (`name`), KEY `id` (`id`), KEY `gobblin_job_template_fk` (`job_template`), KEY `source_system` (`source_system`), KEY `target_system` (`target_system`), CONSTRAINT `gobblin_job_ibfk_1` FOREIGN KEY (`source_system`) REFERENCES `gobblin_sync_system` (`id`), CONSTRAINT `gobblin_job_ibfk_2` FOREIGN KEY (`target_system`) REFERENCES `gobblin_sync_system` (`id`), CONSTRAINT `gobblin_job_template_fk` FOREIGN KEY (`job_template`) REFERENCES `gobblin_job_template` (`name`) ) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8; CREATE TABLE `gobblin_job_template` ( `name` varchar(255) NOT NULL DEFAULT '', `configs` text NOT NULL, PRIMARY KEY (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `gobblin_sync_system` ( `id` varchar(255) NOT NULL, `name` varchar(64) NOT NULL, `type` varchar(64) NOT NULL DEFAULT '', `datacenter` varchar(64) DEFAULT NULL, `region` varchar(64) DEFAULT NULL, `zone` varchar(64) DEFAULT NULL, `users` varchar(1024) DEFAULT NULL, `on_hold` tinyint(1) NOT NULL DEFAULT '0', `deprecated` tinyint(1) NOT NULL DEFAULT '0', `source_support` tinyint(1) NOT NULL DEFAULT '0', `target_support` tinyint(1) NOT NULL DEFAULT '0', `configs` text, `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE gobblin_job_owner ( name varchar(255) NOT NULL, email varchar(255) NOT NULL, team_name int(11) DEFAULT NULL, team_email int(11) DEFAULT NULL, org_name int(11) DEFAULT NULL, created_date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (email) ) ENGINE = InnoDB DEFAULT CHARSET = utf8;
Alternate proposal:
//Shirshanka: this will allow us to model job_config as a schema-d thing ... and evolve versions without having to use DDL to perform MySQL schema changes... also allows to swap out a K-V store for this easily CREATE TABLE `gobblin_job` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `configs` text, `version` int(11), `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`name`), KEY `name` (`name`), KEY `id` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
JOB MGMT. CRUD REST API
Create New job example
POST body
METHOD: POST URL: http://localhost:9090/job/PullFromWikipedia BODY: { "jobName": "PullFromWikipedia", "jobDescription": "A getting started example for Gobblin", "jobDisabled": false, "jobProperties": { "job.group": "Wikipedia", "source.class": "org.apache.gobblin.example.wikipedia.WikipediaSource", "test_remove": "test_remove_value", "something": "asdsad" } } RESPONSE: 201 - created
2. update existing job
POST body
METHOD: PUT URL: http://localhost:9090/job/PullFromWikipedia BODY: { "name": "PullFromWikipedia", "description": "A getting started example for Gobblin - pull some pages from wikipedia", "disabled": true, "schedule": "", "priority": 98, "configs": { "job.group": "Wikipedia", "source.class": "org.apache.gobblin.example.wikipedia.WikipediaSource", "test_add": "test_remove_value", "something": "asdsad" }, "ownerEmail": "jsenjaliya@paypal.com", "sourceSystem": "simba", "targetSystem": "RADD-LVS", "configsToRemove": [ "test_remove" ] } RESPONSE: 200 - OK
3. GET existing job
GET Result
METHOD: GET URL: http://localhost:9090/job/PullFromWikipedia BODY: None RESPONSE: { "schedule": "run_once", "configs": { "job.group": "Wikipedia", "source.class": "org.apache.gobblin.example.wikipedia.WikipediaSource", "something": "asdsad", "test_add": "test_remove_value" }, "createdDate": 1587770060000, "sourceSystem": "simba", "name": "PullFromWikipedia", "description": "A getting started example for Gobblin - pull some pages from wikipedia", "disabled": true, "id": 1, "updatedDate": 1587774853000, "priority": 98, "targetSystem": "RADD-LVS", "ownerEmail": "jsenjaliya@paypal.com" }