Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
join_table:
    table_reference JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)

table_reference:
    table_factor
  | join_table

table_factor:
    tbl_name [alias]
  | table_subquery alias
  | ( table_references )

join_condition:
    ON equality_expression ( AND equality_expression )*

equality_expression:
    expression = expression

Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.

...


See Select Syntax for the context of this join syntax.

Info
titleVersion 0.13.0+: Implicit join notation

Implicit join notation is supported starting with Hive 0.13.0 (see HIVE-5558). This allows the FROM clause to join a comma-separated list of tables, omitting the JOIN keyword. For example:

SELECT *
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';

Info
titleVersion 0.13.0+: Implicit join notationUnqualified column references

Unqualified column references are supported in join conditions, Implicit join notation is supported starting with Hive 0.13.0 (see HIVE-55586393). This allows the FROM clause to join a comma-separated list of tables, omitting the JOIN keyword. For example:Hive attempts to resolve these against the inputs to a Join. If an unqualified column reference resolves to more than one table, Hive will flag it as an ambiguous reference.

For example:

CREATE TABLE a (k1 string, v1 string);
CREATE TABLE b (k2 string, v2 string);

SELECT k1, v1, k2, v2
FROM a JOIN b ON k1 = k2; SELECT *
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';

Info
titleVersion 02.132.0+: Unqualified column referencesComplex expressions in ON clause

Complex expressions in ON clause are supportedUnqualified column references are supported in join conditions, starting with Hive 02.132.0 (see HIVE-15211, HIVE-639315251). Hive attempts to resolve these against the inputs to a Join. If an unqualified column reference resolves to more than one table, Hive will flag it as an ambiguous reference.

For example:

CREATE TABLE a (k1 string, v1 string);
CREATE TABLE b (k2 string, v2 string);

SELECT k1, v1, k2, v2
FROM a JOIN b ON k1 = k2; 

Prior to that, Hive did not support join conditions that are not equality conditions.

In particular, syntax for join conditions was restricted as follows:

join_condition:
ON equality_expression ( AND equality_expression )*
equality_expression:
expression = expression

Examples

Some salient points to consider when writing join queries are as follows:

  • Only equality joins Complex join expressions are allowed e.g.

    Code Block
      SELECT a.* FROM a JOIN b ON (a.id = b.id)
    
    Code Block
      SELECT a.* FROM a JOIN b ON (a.id = b.id AND a.department = b.department)
    

    are both valid joins, however

    Code Block
      SELECT a.* FROM a LEFT OUTER JOIN b ON (a.id <> b.id)
    

    are valid joinsis NOT allowed.

  • More than 2 tables can be joined in the same query e.g.

    Code Block
      SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
    

    is a valid join.

...