This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • DeveloperGuide

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add link to DDL doc for JsonSerde

...

  • MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
  • LazySimpleSerDe: This SerDe can be used to read the same data format as MetadataTypedColumnsetSerDe and TCTLSeparatedProtocol, however, it creates Objects in a lazy way which provides better performance. Starting in Hive 0.14.0 it also supports read/write data with a specified encode charset, for example:

    Code Block
    ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK');

    LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property hive.lazysimple.extended_boolean_literal is set to true (Hive 0.14.0 and later). The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.

  • ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first.
  • DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).

Also:

  • For JSON files, JsonSerDe was added in Hive 0.12.0. An Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.
  • An Avro SerDe was added in Hive 0.9.1.  Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause.
  • A SerDe for the ORC file format was added in Hive 0.11.0.
  • A SerDe for Parquet was added via plug-in in Hive 0.10 and natively in Hive 0.13.0.
  • A SerDe for CSV was added in Hive 0.14.

...

SyntaxEquivalent

STORED AS AVRO /

STORED AS AVROFILE

ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

STORED AS ORC /

STORED AS ORCFILE

ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

STORED AS PARQUET /

STORED AS PARQUETFILE

ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
STORED AS RCFILE
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
  OUTPUTFORMAT  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
STORED AS SEQUENCEFILE
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT  OUTPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileOutputFormat'
STORED AS TEXTFILE
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'

To add a new native SerDe with STORED AS keyword, follow these steps:

...

Apparently the Hive tests do not run successfully after a clean unless you run ant package first. Not sure why build.xml doesn't encode this dependency.

Adding new unit tests

Note
titleAnt to Maven

As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.

See the Hive Developer FAQ for updated instructions. See also Tips for Adding New Tests in Hive and How to Contribute: Add a Unit Test.

First, write a new myname.q in ql/src/test/queries/clientpositive.

...

Similarly, to add negative client tests, write a new query input file in ql/src/test/queries/clientnegative and run the same command, this time specifying the testcase name as TestNegativeCliDriver instead of TestCliDriver. Note that for negative client tests, the output file if created using the overwrite flag can be be found in the directory ql/src/test/results/clientnegative.

See also Tips for Adding New Tests in Hive.

Debugging Hive Code

Anchor
DebuggingHiveCode
DebuggingHiveCode

Hive code includes both client-side code (e.g., compiler, semantic analyzer, and optimizer of HiveQL) and server-side code (e.g., operator/task/SerDe implementations). Debugging is different for client-side and server-side code, as described below.

...