Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

Code Block

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query
AUTHORIZATION is supported from HIVE-0.14.0 via HIVE-5961

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

...

As an example, consider the following EXPLAIN query:

Code Block

EXPLAIN
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

The output of this statement contains the following parts:

  • The Abstract Syntax Tree

    Code Block
    
    ABSTRACT SYNTAX TREE:
      (TOK_QUERY (TOK_FROM (TOK_TABREF src)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB dest_g1)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF src key)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_FUNCTION substr (TOK_COLREF src value) 4)))) (TOK_GROUPBY (TOK_COLREF src key))))
    
    
  • The Dependency Graph

    Code Block
    
    STAGE DEPENDENCIES:
      Stage-1 is a root stage
      Stage-2 depends on stages: Stage-1
      Stage-0 depends on stages: Stage-2
    

    This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.

  • The plans of each Stage

    Code Block
    
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Alias -> Map Operator Tree:
            src
                Reduce Output Operator
                  key expressions:
                        expr: key
                        type: string
                  sort order: +
                  Map-reduce partition columns:
                        expr: rand()
                        type: double
                  tag: -1
                  value expressions:
                        expr: substr(value, 4)
                        type: string
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(UDFToDouble(VALUE.0))
              keys:
                    expr: KEY.0
                    type: string
              mode: partial1
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                    name: binary_table
    
      Stage: Stage-2
        Map Reduce
          Alias -> Map Operator Tree:
            /tmp/hive-zshao/67494501/106593589.10001
              Reduce Output Operator
                key expressions:
                      expr: 0
                      type: string
                sort order: +
                Map-reduce partition columns:
                      expr: 0
                      type: string
                tag: -1
                value expressions:
                      expr: 1
                      type: double
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(VALUE.0)
              keys:
                    expr: KEY.0
                    type: string
              mode: final
              Select Operator
                expressions:
                      expr: 0
                      type: string
                      expr: 1
                      type: double
                Select Operator
                  expressions:
                        expr: UDFToInteger(0)
                        type: int
                        expr: 1
                        type: double
                  File Output Operator
                    compressed: false
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                        name: dest_g1
    
      Stage: Stage-0
        Move Operator
          tables:
                replace: true
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1
    

    In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.

A map/reduce stage itself comprises of 2 parts:

...

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

Code Block

EXPLAIN DEPENDENCY
  SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

the following output is produced:

Code Block

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}

...

The dependencies show the parents in case a table is accessed via a view. Consider the following queries:

Code Block

CREATE VIEW V1 AS SELECT key, value from src;
EXPLAIN DEPENDENCY SELECT * FROM V1;

The following output is produced:

Code Block

{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}

...

All the outputs are shown if a table is being accessed via multiple parents.

Code Block

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;
CREATE VIEW V4 AS
  SELECT src1.key, src2.value as value1, src3.value as value2
  FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;
EXPLAIN DEPENDENCY SELECT * FROM V4;

The following output is produced.

Code Block

{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}

As can be seen, src is being accessed via parents v1 and v4.

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if exists. For example, for a query like:

Code Block
EXPLAIN AUTHORIZATION
  SELECT * FROM src JOIN srcpart;

the following output is produced:

Code Block
INPUTS: 
  default@srcpart
  default@src
  default@srcpart@ds=2008-04-08/hr=11
  default@srcpart@ds=2008-04-08/hr=12
  default@srcpart@ds=2008-04-09/hr=11
  default@srcpart@ds=2008-04-09/hr=12
OUTPUTS: 
  hdfs://localhost:9000/tmp/.../-mr-10000
CURRENT_USER: 
  navis
OPERATION: 
  QUERY
AUTHORIZATION_FAILURES: 
  Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

With FORMATTED keyword, it will be returned in JSON format.

 

Code Block
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}