Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add link to DDL doc for JsonSerde

...

  • MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
  • LazySimpleSerDe: This SerDe can be used to read the same data format as MetadataTypedColumnsetSerDe and TCTLSeparatedProtocol, however, it creates Objects in a lazy way which provides better performance. Starting in Hive 0.14.0 it also supports read/write data with a specified encode charset, for example:

    Code Block
    ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK');

    LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property hive.lazysimple.extended_boolean_literal is set to true (Hive 0.14.0 and later). The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.

  • ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first.
  • DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).

Also:

  • For JSON files, JsonSerDe was added in Hive 0.12.0. An Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.
  • An Avro SerDe was added in Hive 0.9.1.  Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause.
  • A SerDe for the ORC file format was added in Hive 0.11.0.
  • A SerDe for Parquet was added via plug-in in Hive 0.10 and natively in Hive 0.13.0.
  • A SerDe for CSV was added in Hive 0.14.

...