You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Summary of Hive Parquet support

Hive 0.10, 0.11, and 0.12

To use Parquet with Hive 0.10-0.12 you must download the Parquet Hive package from the Parquet project. You want the parquet-hive-bundle jar in Maven Central.

Hive 0.13

Native Parquet support is pending for 0.13 via HIVE-5783.

Introduction to Parquet

Parquet (http://parquet.io/)  is an ecosystem wide columnar format for Hadoop. At the time of this writing it supports:

Engines

  • Apache Hive
  • Apache Drill
  • Cloudera Impala
  • Apache Crunch
  • Apache Pig
  • Cascading

Data description

  • Apache Avro
  • Apache Thrift
  • Google Protocol Buffers

The latest information on Parquet engine and data description support, please visit the Parquet-MR projects feature matrix.

File Format

The parquet project has an in-depth description of the format including motivations and diagrams.

Hive QL Syntax

Hive 0.10 - 0.12

CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 strct STRUCT<A:STRING,B:STRING>) 
PARTITIONED BY (part string)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 STORED AS
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

Hive 0.13

CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 strct STRUCT<A:STRING,B:STRING>) 
PARTITIONED BY (part string)
STORED AS PARQUET;

Limitations

  • Binary, timestamp, date, char, varchar or decimal support are pending (HIVE-6384)
  • Create Table AS SELECT (CTAS) and column rename support are pending (HIVE-6375)

 

  • No labels