Design doc for Kite Connector supporting Hbase for basic reads/ writes and DFM( delta fetch merge) if possible
JIRA: https://issues.apache.org/jira/browse/SQOOP-1744 and its sub-tickets.
Currently we have KiteConnector in Sqoop2 ( as of writing this doc) with support for writing to and reading from a HDFS dataset. The goal of SQOOP-1744 is to extend it to support reading from and writing to Hbase data set as well. An additional goal will be to support reading delta records and writing delta records from/to hbase using the Kite SDK/ APIs.
There is no design or feature doc yet written for the details of the KiteConnector. Here are the relevant JIRA tickets that provide details on how the Kite FROM and Kite TO connectors work.
Kite FROM part : https://issues.apache.org/jira/browse/SQOOP-1647
Kite TO part ( for writing to HDFS via Kite ) : https://issues.apache.org/jira/browse/SQOOP-1588
UPDATE: A design wiki was added later on Kite Connector Design
Overall there are 2 ways to implementing this functionality using the KiteSDK
Option 1
Duplicate a lot of the code in KiteConnector and add a new independent connector for KiteHbaseConnector. The major con is the code duplication and effort to support Yet another connector
Option 2:
Use the current KiteConnector and add a enum to select the type of dataset Kite will create underneath, or parse to URI given in the FromJobConfig and ToJobConfig to figure out the dataset to be HIVE/ Hbase or HDFS
public enum DataSetType { HDFS, HBASE, HIVE } // use this enum to determine what dataset kite needs to create underneath @Input public DataSetType datasetType or // parse this to figure out the data set @Input(size = 255, validators = {@Validator(DatasetURIValidator.class)}) public String uri |
Pros :
Integration test suite will be enhanced to add support for the JDBC-KiteHBaseConenctor and vice versa
Performance Testing
None at this point
hbase:<zookeeper>/<dataset-name>
The zookeeper argument is a comma separated list of hosts. For example
hbase:host1,host2:9999,host3/myDataset