Phase-1: Module arrangements

  • Motivation:
    • Gobblin has about 400k java lines of code spread across ~76 different modules. Its very difficult to navigate the code, specially for new beginners. the proposal is to re-organise the modules into proper parent categories.
  • Proposed Change
    • collapse all modules under following parent modules
      New ModulesNew PurposeNotes

      Gobblin-core

      All gobbline core components required for the platform to run end to end

      Gobblin-API

      All REST APIs that operates on platform

      Gobblin-connectors

      All extractor, convertors, writers and publisher classes for different source and target platformsThis can be new repo all together for all kinds of gobblin plugins.

      Gobblin-plugins 

      All Gobblin orchastrator or supportive components like Oozie, Azkaban, etc...

      Gobblin-docs

      All documents

      Gobblin-utils

      All additional or supportive utilities like crypto, encryptions, etc...

      Gobblin-deployments

      All gobblin deployment modes, aws, yarn, MR, etc...



  • Proposed execution: 
    • execute the change in a way that does not change the commit history or does not change the ownership info to make sure the original contributor info remains the same.
    • If the above is not possible, I suggest someone from LinkedIn team executes these migration.
    • more actions <TBD>
  • New or Changed Public Interfaces: there should not be have any impact on compatibility.
  • Migration Plan and Compatibility: since the packaging structure ( bin, conf, lib ) is not changing, there should not be requiring migration plan. ( May only the full package.class_name if its used as utility)
  • Rejected Alternatives: N/A ( there is no alternative to arranging the modules)
  • TODO: add/update modules as components in JIRA once finalized.


Note: bold are parent modules that we should keep, every other module should be rollable under the parent ones.

Parent Module

Sub-Module

Old Purpose

New Parent Module

Notes

gobblin-admin



gobblin-api



gobblin-audit

Gobblin-core
gobblin-aws

Gobblin-deployment-modes
gobblin-binary-management

Gobblin-core
gobblin-cluster

Gobblin-deployment-modes
gobblin-compaction

gobblin-utils
gobblin-config-management

Gobblin-core



gobblin-config-client



gobblin-config-core


gobblin-core

Gobblin-core
gobblin-core-base

Gobblin-core
gobblin-data-management

Gobblin-core
gobblin-distribution


All following directories related to distribution like all gradle files ( except master one that requires to be on parent level ), conf, bin
gobblin-docker

Gobblin-deployment-modes

gobblin-base
gobblin-distributions

gobblin-distributions
gobblin-distributions

gobblin-standalone
Gobblin-deployment-modes

gobblin-wikipedia
gobblin-example
gobblin-docs



gobblin-example

gobblin-docs
gobblin-hive-registration

gobblin-plugin
gobblin-metastore

gobblin-hive-registration
gobblin-metrics-libs

gobblin-metrics

gobblin-metrics



gobblin-metrics-base


gobblin-modules

gobblin-connectors

gobblin-avro-json



gobblin-azkaban
Gobblin-plugins

gobblin-azure-datalake



gobblin-codecs
gobblin-utils

gobblin-compliance
gobblin-utils

gobblin-couchbase



gobblin-crypto
gobblin-utils

gobblin-crypto-provider
gobblin-utils

gobblin-elasticsearch



gobblin-elasticsearch-deps



gobblin-eventhub



gobblin-grok



gobblin-helix



gobblin-http



gobblin-kafka-08



gobblin-kafka-09



gobblin-kafka-common



gobblin-metadata
gobblin-metrics

gobblin-metrics-graphite



gobblin-metrics-hadoop



gobblin-metrics-influxdb



gobblin-orc-dep



gobblin-parquet
Gobblin-plugins

gobblin-service-kafka
gobblin-service

gobblin-sql



gobblin-zuora
Gobblin-connectors

google-ingestion


gobblin-oozie

Gobblin-plugins
gobblin-rest-service

gobblin-api

gobblin-rest-api



gobblin-rest-client



gobblin-rest-server


gobblin-restli

gobblin-api

gobblin-flow-config-service



gobblin-restli-utils



gobblin-throttling-service


gobblin-runtime

Gobblin-core
gobblin-runtime-hadoop

Gobblin-connectors
gobblin-salesforce

gobblin-connectors
gobblin-service

Gobblin-deployment-modesrename this to gobblin-as-a-service ?
gobblin-test

Gobblin-test
gobblin-test-harness

Gobblin-test
gobblin-test-utils

Gobblin-utils
gobblin-tunnel



gobblin-utils



gobblin-yarn

Gobblin-deployment-modes

Gobblin-deployment-modes





Phase-2: code refactor to support module arrangement

  • Motivation:
    • while re-arranging modules, once figured out based on above table, we may have to refactor some code to better categorize within the module.
  • Proposed Change
    • TBD
  • New or Changed Public Interfaces: there should not be have any impact on compatibility.
  • Migration Plan and Compatibility: TBD
  • Rejected Alternatives: TBD













  • No labels