Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added bullet re: UNION ALL restriction

...

There are multiple ways to modify data in Hive:

EXPORT and IMPORT commands are also available (as of Hive 0.8).

...

  • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns. If hive.typecheck.on.insert is set to true, these values are validated, converted and normalized to conform to their column types (Hive 0.12.0 onward). 
  • Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query.
  • The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement.
  • The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table).
  • As of Hive 0.14, if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a transaction manager that implements ACID, then INSERT OVERWRITE will be disabled for that table.  This is to avoid users unintentionally overwriting transaction history.  The same functionality can be achieved by using TRUNCATE TABLE (for non-partitioned tables) or DROP PARTITION followed by INSERT INTO.
  • As of Hive 1.1.0 the TABLE keyword is optional.
  • As of Hive 1.2.0 each INSERT INTO T can take a column list like INSERT INTO T (z, x, c1).  See Description of HIVE-9481 for examples.
  • As of Hive 3.1.0 INSERT OVERWRITE from a source with UNION ALL on full CRUD ACID tables is not allowed.
Notes
  • Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.
  • Starting with Hive 0.13.0, the select statement can include one or more common table expressions (CTEs) as shown in the SELECT syntax. For an example, see Common Table Expression.

...

Dynamic partition inserts are disabled by default prior to Hive 0.9.0 and enabled by default in Hive 0.9.0 and later. These are the relevant configuration properties for dynamic partition inserts:

Configuration property

Default

Note

hive.exec.dynamic.partition

true

Needs to be set to true to enable dynamic partition inserts

hive.exec.dynamic.partition.mode

strict

In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic

hive.exec.max.dynamic.partitions.pernode

100

Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

hive.exec.max.dynamic.partitions

1000

Maximum number of dynamic partitions allowed to be created in total

hive.exec.max.created.files

100000

Maximum number of HDFS files created by all mappers/reducers in a MapReduce job

hive.error.on.empty.partition

false

Whether to throw an exception if dynamic partition insert generates empty results

Example
Code Block
sql
sql
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt

...

  • Each row listed in the VALUES clause is inserted into table tablename.
  • Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
  • Dynamic partitioning is supported in the same way as for INSERT...SELECT.
  • If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.
  • Hive does not support literals for complex types (array, map, struct, union), so it is not possible to use them in INSERT INTO...VALUES clauses. This means that the user cannot insert data into a complex datatype column using the INSERT INTO...VALUES clause.

...

  • 1, 2, or 3 WHEN clauses may be present; at most 1 of each type:  UPDATE/DELETE/INSERT.
  • WHEN NOT MATCHED must be the last WHEN clause.
  • If both UPDATE and DELETE clauses are present, the first one in the statement must include [AND <boolean expression>].
  • Vectorization will be turned off for merge operations.  This is automatic and requires no action on the part of the user.  Non-delete operations are not affected.  Tables with deleted data can still be queried using vectorization.
Examples

 

...