Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

In INSERT ... SELECT ... queries, the dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.

  • all DP columns – only allowed in nonstrict mode. In strict mode, we should throw an error. e.g.,
Code Block
   INSERT OVERWRITE TABLE T PARTITION (ds, hr) 
   SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10;
  • mixed SP & DP columns. e.g.,
Code Block
   INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr) 
   SELECT key, value, /*ds,*/ hr FROM srcpart WHERE ds is not null and hr>10;
  • SP is a subpartition of a DP: should throw an error because partition column order determins directory hierarchy. We cannot change the hierarchy in DML. e.g.,
Code Block
   -- throw an exception
   INSERT OVERWRITE TABLE T PARTITION (ds, hr = 11) 
   SELECT key, value, ds/*, hr*/ FROM srcpart WHERE ds is not null and hr=11;
  • multi-table insert. e.g.,
Code Block
   FROM S
   INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr) 
   SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10
   INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12)
   SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;
  • CTAS – syntax is a little bit different from CTAS on non-partitioned tables, since the schema of the target table is not totally derived from the select-clause. We need to specify the schema including partitioning columns in the create-clause. e.g.,
Code Block
   CREATE TABLE T (key int, value string) PARTITIONED BY (ds string, hr int) AS 
   SELECT key, value, ds, hr+1 hr1 FROM srcpart WHERE ds is not null and hr>10;

...

2) Partitioning column value to directory name conversion:
After converting column value to string, we still need to convert the string value to a valid directory name. Some reasons are:

  • string length is unlimited in theory, but HDFS/local FS directory name length is limited.
  • string value could contains special characters that is reserved in FS path names (such as '/' or '..').
  • what should we do for partition column ObjectInspector?

We need to define a UDF (say hive_qname_partition(T.part_col)) to take a primitive typed value and convert it to a qualified partition name.

...