- MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
LazySimpleSerDe: This SerDe can be used to read the same data format as MetadataTypedColumnsetSerDe and TCTLSeparatedProtocol, however, it creates Objects in a lazy way which provides better performance. Starting in Hive 0.14.0 it also supports read/write data with a specified encode charset, for example:
ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK');
LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property hive.lazysimple.extended_boolean_literal is set to
true(Hive 0.14.0 and later). The default is
false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.
- ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first.
- DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).
- For JSON files, JsonSerDe was added in Hive 0.12.0. An Amazon SerDe is available at
s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jarfor releases prior to 0.12.0.
- An Avro SerDe was added in Hive 0.9.1. Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause.
- A SerDe for the ORC file format was added in Hive 0.11.0.
- A SerDe for Parquet was added via plug-in in Hive 0.10 and natively in Hive 0.13.0.
- A SerDe for CSV was added in Hive 0.14.
STORED AS AVRO /
STORED AS AVROFILE
STORED AS ORC /
STORED AS ORCFILE
STORED AS PARQUET /
STORED AS PARQUETFILE
|STORED AS RCFILE|
|STORED AS SEQUENCEFILE|
|STORED AS TEXTFILE|
To add a new native SerDe with STORED AS keyword, follow these steps:
Apparently the Hive tests do not run successfully after a clean unless you run
ant package first. Not sure why build.xml doesn't encode this dependency.
Adding new unit tests
As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.
First, write a new myname.q in ql/src/test/queries/clientpositive.
Similarly, to add negative client tests, write a new query input file in ql/src/test/queries/clientnegative and run the same command, this time specifying the testcase name as TestNegativeCliDriver instead of TestCliDriver. Note that for negative client tests, the output file if created using the overwrite flag can be be found in the directory ql/src/test/results/clientnegative.
See also Tips for Adding New Tests in Hive.
Debugging Hive Code
Hive code includes both client-side code (e.g., compiler, semantic analyzer, and optimizer of HiveQL) and server-side code (e.g., operator/task/SerDe implementations). Debugging is different for client-side and server-side code, as described below.