Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: top-level UNION ALL supported in 0.13.0 (HIVE-6189); minor edits

Table of Contents

Union Syntax

Code Block

select_statement UNION ALL select_statement UNION ALL select_statement ...

UNION is used to combine the result from multiple SELECT statements into a single result set. We Hive currently only support supports UNION ALL (bag union) i.e. , in which duplicates are not eliminated. The number and names of columns returned by each select_statement has have to be the same. Otherwise, a schema error is thrown.

If some additional processing has to be done on the result of the UNION, the entire statement expression can be embedded in a FROM clause like below:

Code Block

SELECT *
FROM (
  select_statement
  UNION ALL
  select_statement
) unionResult

For example, if we suppose there are two different tables that track which user has published a video and which user has published a comment, the following query joins the results of a union all UNION ALL with the user table to create a single annotated stream for all the video publishing and comment publishing events:

Code Block

    SELECT u.id, actions.date
    FROM (
        SELECT av.uid AS uid 
        FROM action_video av 
        WHERE av.date = '2008-06-03' 
        UNION ALL 
        SELECT ac.uid AS uid 
        FROM action_comment ac 
        WHERE ac.date = '2008-06-03' 
     ) actions JOIN users u ON (u.id = actions.uid) 

Unions can be used in views, inserts, and CTAS (create table as select) statements. A query can contain multiple UNION ALL clauses, as shown in the syntax above.

Info
titleVersion information

In Hive 0.12.0 and earlier releases, unions can only be used within a subquery such as "SELECT * FROM (select_statement UNION ALL select_statement UNION ALL ...) unionResult".

As of Hive 0.13.0, unions can also be used in a top-level query: "select_statement UNION ALL select_statement UNION ALL ...". (See HIVE-6189.)