- As Sqoop 2 does not allow to specify InputFormat and OutputFormat, data reading can be Inefficient as we cannot create concurrent data readers, especially for a un-partitioned dataset. Still need some investigation with Kite team for a solution.
- HBase support (SQOOP-1744) will be an individual improvement to the original design.
- The current implementation uses the default IDF class (CSVIDF) for data conversion. Recently we have introduced AvroIDF. As Kite uses Avro internally, it makes sense to use AvroIDF instead of CSVIDF. This will involve two things:
- Clean up AvroTypeUtil and KiteDataTypeUtil.
- AvroIDF will be responsible to convert every Sqoop data type (FixedPoint, Text, etc.) to corresponding Avro representation.
- (VB) : The complex types array/ map/ enum are not supported in the current design/implementation.
- CSV format for HDFS-write via KiteConnector only supports "primitive types" since it is experimentally supported in Kite SDK
- The design details of Delta Write in Kite-HDFS is not included in this wiki, another design wiki will be added for SQOOP-1999