7th Meeting


Rob Weltman, Bilung Lee, Arvind Prabhakar, Cheolsoo Park, Jarek Jarcec Cecho


1. Datatype support: we discussed the datatype support for Sqoop2. Among the points raised were the following:

  • Sqoop 1 datatypes are different for Java, Hive and Avro. Most of the time the connector does the datatype mapping.
  • For Sqoop ,2 we would like to standardize on the datatype support so that all connectors can conform for consistency.
  • Some datatypes discussed today were:
    • String, char: UTF 8 encoded
    • Timestamp: Unix time representation to millisecond level
    • Binary: byte array
    • Numeric: byte, short, int, long, float, double, byte
    • Null type

2. Intermediate data format

Jarcec has started out the intermediate format specification page. This was discussed at a high level in context of the various datatypes to be used.


  • Arvind to create a wiki page for datatypes and send it to dev list for discussion
  • Jarcec to further refine intermediate representation specification on the wiki

Agenda for next meeting: follow up discussion regarding datatype support and intermediate representation


  • No call on July 4th, being a holiday in US. Next call will be on July 11th. All urged to continue these discussions on the mailing list.
  • No labels