Hive provides an
EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
The use of
EXTENDED in the
EXPLAIN statement produces extra information about the operators in the plan. This is typically physical information like file names.
A Hive query gets converted into a sequence (it is more a Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output has three parts:
- The Abstract Syntax Tree for the query
- The dependencies between the different stages of the plan
- The description of each of the stages
The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.
As an example, consider the following
The output of this statement contains the following parts:
The Dependency Graph
This shows that Stage-1 is the root stage, Stage-2 is executed after Stage-1 is done and Stage-0 is executed after Stage-2 is done.
The plans of each Stage
In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
Sort orderindicates the number of columns in key expressions that are used for sorting. Each "
+" represents one column sorted in ascending order, and each "
-" represents a column sorted in descending order.
A map/reduce stage itself has 2 parts:
- A mapping from table alias to Map Operator Tree – This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
- A Reduce Operator Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.
The CBO Clause
The CBO clause outputs the plan generated by Calcite optimizer. It can optionally include information about the cost of the plan using Calcite default cost model and cost model used for join reordering. Since Hive release 4.0.0 (HIVE-17503 / HIVE-21184).
Syntax: EXPLAIN [FORMATTED] CBO [COST|JOINCOST]
- COST option prints the plan and cost calculated using Calcite default cost model.
- JOINCOST option prints the plan and cost calculated using the cost model used for join reordering.
For example, we can execute the following statement:
The query will be optimized and Hive produces the following output:
In turn, we can execute the following command:
It will produce a similar plan, but the cost for each operator will be embedded next to the operator descriptors:
The AST Clause
Outputs the query's Abstract Syntax Tree.
The DEPENDENCY Clause
The use of
DEPENDENCY in the
EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:
the following output is produced:
The inputs contain both the tables and the partitions. Note that the table is present even if none of the partitions is accessed in the query.
The dependencies show the parents in case a table is accessed via a view. Consider the following queries:
The following output is produced:
As above, the inputs contain the view V1 and the table 'src' that the view V1 refers to.
All the outputs are shown if a table is being accessed via multiple parents.
The following output is produced.
As can be seen, src is being accessed via parents v1 and v4.
The AUTHORIZATION Clause
The use of
AUTHORIZATION in the
EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if any exist. For example, for a query like:
the following output is produced:
FORMATTED keyword, it will be returned in JSON format.
The LOCKS Clause
This is useful to understand what locks the system will acquire to run the specified query. Since Hive release 3.2.0 (HIVE-17683).
Will produce output like this.
is also supported which will produce JSON encoded output.
The VECTORIZATION Clause
Adds detail to the EXPLAIN output showing why Map and Reduce work is not vectorized. Since Hive release 2.3.0 (HIVE-11394).
Syntax: EXPLAIN VECTORIZATION [ONLY] [SUMMARY|OPERATOR|EXPRESSION|DETAIL]
- ONLY option suppresses most non-vectorization elements.
- SUMMARY (default) shows vectorization information for the PLAN (is vectorization enabled) and a summary of Map and Reduce work.
- OPERATOR shows vectorization information for operators. E.g. Filter Vectorization. Includes all information of SUMMARY.
- EXPRESSION shows vectorization information for expressions. E.g. predicateExpression. Includes all information of SUMMARY and OPERATOR.
- DETAIL shows detail-level vectorization information. It includes all information of SUMMARY, OPERATOR, and EXPRESSION.
The optional clause defaults are not ONLY and SUMMARY.
See HIVE-11394 for more details and examples.
The ANALYZE Clause
Annotates the plan with actual row counts. Since in Hive 2.2.0 (HIVE-14362)
Format is: (estimated row count) / (actual row count)
For the below tablescan; the estimation was 500 rows; but actually the scan only yielded 13 rows.
User-level Explain Output
Since HIVE-18469 in Hive 3.1.0, the user-level explain extended output for any query will be shown in the WebUI / Drilldown / Query Plan after set hive.server2.webui.explain.output=true (default is false).
Since HIVE-9780 in Hive 1.2.0, we support a user-level explain for Hive on Tez users. After set hive.explain.user=true (default is false) if the following query is sent, the user can see a much more clearly readable tree of operations.