EXPLAIN Syntax

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION] query
AUTHORIZATION is supported from HIVE 0.14.0 via HIVE-5961.

The use of EXTENDED in the EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.

A Hive query gets converted into a sequence (it is more a Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output has three parts:

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

Example

As an example, consider the following EXPLAIN query:

EXPLAIN
FROM src INSERT OVERWRITE TABLE dest_g1 SELECT src.key, sum(substr(src.value,4)) GROUP BY src.key;

The output of this statement contains the following parts:

A map/reduce stage itself has 2 parts:

The DEPENDENCY Clause

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

EXPLAIN DEPENDENCY
  SELECT key, count(1) FROM srcpart WHERE ds IS NOT NULL GROUP BY key

the following output is produced:

{"input_partitions":[{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE"}]}

The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.

The dependencies show the parents in case a table is accessed via a view. Consider the following queries:

CREATE VIEW V1 AS SELECT key, value from src;
EXPLAIN DEPENDENCY SELECT * FROM V1;

The following output is produced:

{"input_partitions":[],"input_tables":[{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v1]"}]}

As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.

All the outputs are shown if a table is being accessed via multiple parents.

CREATE VIEW V2 AS SELECT ds, key, value FROM srcpart WHERE ds IS NOT NULL;
CREATE VIEW V4 AS
  SELECT src1.key, src2.value as value1, src3.value as value2
  FROM V1 src1 JOIN V2 src2 on src1.key = src2.key JOIN src src3 ON src2.key = src3.key;
EXPLAIN DEPENDENCY SELECT * FROM V4;

The following output is produced.

{"input_partitions":[{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-08/hr=12"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=11"},{"partitionParents":"[default@v2]","partitionName":"default<at:var at:name="srcpart" />ds=2008-04-09/hr=12"}],"input_tables":[{"tablename":"default@v4","tabletype":"VIRTUAL_VIEW"},{"tablename":"default@v2","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@v1","tabletype":"VIRTUAL_VIEW","tableParents":"[default@v4]"},{"tablename":"default@src","tabletype":"MANAGED_TABLE","tableParents":"[default@v4, default@v1]"},{"tablename":"default@srcpart","tabletype":"MANAGED_TABLE","tableParents":"[default@v2]"}]}

As can be seen, src is being accessed via parents v1 and v4.

The AUTHORIZATION Clause

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if any exist. For example, for a query like:

EXPLAIN AUTHORIZATION
  SELECT * FROM src JOIN srcpart;

the following output is produced:

INPUTS: 
  default@srcpart
  default@src
  default@srcpart@ds=2008-04-08/hr=11
  default@srcpart@ds=2008-04-08/hr=12
  default@srcpart@ds=2008-04-09/hr=11
  default@srcpart@ds=2008-04-09/hr=12
OUTPUTS: 
  hdfs://localhost:9000/tmp/.../-mr-10000
CURRENT_USER: 
  navis
OPERATION: 
  QUERY
AUTHORIZATION_FAILURES: 
  Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]

With the FORMATTED keyword, it will be returned in JSON format.

 
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}

The LOCKS Clause

This is useful to understand what locks the system will acquire to run the specified query.  Since HIVE-17683

For example

EXPLAIN LOCKS UPDATE target SET b = 1 WHERE p IN (SELECT t.q1 FROM source t WHERE t.a1=5)

Will produce output like this.

LOCK INFORMATION:
default.source -> SHARED_READ
default.target.p=1/q=2 -> SHARED_READ
default.target.p=1/q=3 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_READ
default.target.p=2/q=2 -> SHARED_WRITE
default.target.p=1/q=3 -> SHARED_WRITE
default.target.p=1/q=2 -> SHARED_WRITE

EXPLAIN FORMATTED LOCKS <sql>

is also supported which will produce JSON encoded output.

The VECTORIZATION Clause

Adds detail to the EXPLAIN output showing why Map and Reduce work is not vectorized.

Syntax: EXPLAIN VECTORIZATION [ONLY] [SUMMARY|OPERATOR|EXPRESSION|DETAIL]

The optional clause defaults are not ONLY and SUMMARY.

See HIVE-11394 for more details and examples.

User-level Explain Output

Since HIVE-8600 in Hive 1.1.0, we support a user-level explain extended output for any query at the log4j INFO level after set hive.log.explain.output=true (default is false).

Since HIVE-18469 in Hive 3.1.0, the user-level explain extended output for any query will be shown in the WebUI / Drilldown / Query Plan after set hive.server2.webui.explain.output=true (default is false).

Since HIVE-9780 in Hive 1.2.0, we support a user-level explain for Hive on Tez users. After set hive.explain.user=true (default is false) if the following query is sent, the user can see a much more clearly readable tree of operations.

Since HIVE-11133 in Hive 3.0.0, we support a user-level explain for Hive on Spark users. A separate configuration is used for Hive-on-Spark, hive.spark.explain.user which is set to false by default.

EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31'


Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 2
         File Output Operator [FS_8]
            compressed:false
            Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
            Group By Operator [GBY_6]
            |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
            |  outputColumnNames:["_col0","_col1"]
            |  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            |<-Map 1 [SIMPLE_EDGE]
               Reduce Output Operator [RS_5]
                  sort order:
                  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                  value expressions:_col0 (type: bigint), _col1 (type: bigint)
                  Group By Operator [GBY_4]
                     aggregations:["sum(_col0)","sum(_col1)"]
                     outputColumnNames:["_col0","_col1"]
                     Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                     Select Operator [SEL_2]
                        outputColumnNames:["_col0","_col1"]
                        Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                        TableScan [TS_0]
                           alias:src_orc_merge_test_part
                           Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE