Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: link to HIVE-9780, add more headings, minor edits

...

A Hive query gets converted into a sequence (it is more an a Directed Acyclic Graph) of stages. These stages may be map/reduce stages or they may even be stages that do metastore or file system operations like move and rename. The explain output comprises of has three parts:

  • The Abstract Syntax Tree for the query
  • The dependencies between the different stages of the plan
  • The description of each of the stages

The description of the stages itself shows a sequence of operators with the metadata associated with the operators. The metadata may comprise of things like filter expressions for the FilterOperator or the select expressions for the SelectOperator or the output file names for the FileSinkOperator.

Example

As an example, consider the following EXPLAIN query:

...

  • The plans of each Stage

    Code Block
    STAGE PLANS:
      Stage: Stage-1
        Map Reduce
          Alias -> Map Operator Tree:
            src
                Reduce Output Operator
                  key expressions:
                        expr: key
                        type: string
                  sort order: +
                  Map-reduce partition columns:
                        expr: rand()
                        type: double
                  tag: -1
                  value expressions:
                        expr: substr(value, 4)
                        type: string
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(UDFToDouble(VALUE.0))
              keys:
                    expr: KEY.0
                    type: string
              mode: partial1
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                    name: binary_table
    
      Stage: Stage-2
        Map Reduce
          Alias -> Map Operator Tree:
            /tmp/hive-zshao/67494501/106593589.10001
              Reduce Output Operator
                key expressions:
                      expr: 0
                      type: string
                sort order: +
                Map-reduce partition columns:
                      expr: 0
                      type: string
                tag: -1
                value expressions:
                      expr: 1
                      type: double
          Reduce Operator Tree:
            Group By Operator
              aggregations:
                    expr: sum(VALUE.0)
              keys:
                    expr: KEY.0
                    type: string
              mode: final
              Select Operator
                expressions:
                      expr: 0
                      type: string
                      expr: 1
                      type: double
                Select Operator
                  expressions:
                        expr: UDFToInteger(0)
                        type: int
                        expr: 1
                        type: double
                  File Output Operator
                    compressed: false
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                        name: dest_g1
    
      Stage: Stage-0
        Move Operator
          tables:
                replace: true
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe
                    name: dest_g1
    

    In this example there are 2 map/reduce stages (Stage-1 and Stage-2) and 1 File System related stage (Stage-0). Stage-0 basically moves the results from a temporary directory to the directory corresponding to the table dest_g1.
    Sort order indicates the number of columns in key expressions that are used for sorting. Each "+" represents one column sorted in ascending order, and each "-" represents a column sorted in descending order.

...

  • A mapping from table alias to Map Operator Tree -- This Tree  This mapping tells the mappers which operator tree to call in order to process the rows from a particular table or result of a previous map/reduce stage. In Stage-1 in the above example, the rows from src table are processed by the operator tree rooted at a Reduce Output Operator. Similarly, in Stage-2 the rows of the results of Stage-1 are processed by another operator tree rooted at another Reduce Output Operator. Each of these Reduce Output Operators partitions the data to the reducers according to the criteria shown in the metadata.
  • A Reduce Operator Tree -- Tree – This is the operator tree which processes all the rows on the reducer of the map/reduce job. In Stage-1 for example, the Reducer Operator Tree is carrying out a partial aggregation where as whereas the Reducer Operator Tree in Stage-2 computes the final aggregation from the partial aggregates computed in Stage-1.

The DEPENDENCY Clause

The use of DEPENDENCY in the EXPLAIN statement produces extra information about the inputs in the plan. It shows various attributes for the inputs. For example, for a query like:

...

As can be seen, src is being accessed via parents v1 and v4.

The AUTHORIZATION Clause

The use of AUTHORIZATION in the EXPLAIN statement shows all entities needed to be authorized to execute the query and authorization failures if existsany exist. For example, for a query like:

...

 

Code Block
"OUTPUTS":["hdfs://localhost:9000/tmp/.../-mr-10000"],"INPUTS":["default@srcpart","default@src","default@srcpart@ds=2008-04-08/hr=11","default@srcpart@ds=2008-04-08/hr=12","default@srcpart@ds=2008-04-09/hr=11","default@srcpart@ds=2008-04-09/hr=12"],"OPERATION":"QUERY","CURRENT_USER":"navis","AUTHORIZATION_FAILURES":["Permission denied: Principal [name=navis, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=default.src], [SELECT] on Object [type=TABLE_OR_VIEW, name=default.srcpart]]"]}
 

User-level Explain Output

Since HIVE-9780 in Hive 1.2.0, we support a user-level explain for Hive on Tez users (It which can be easily extended to Hive on MR or Spark users).  After set hive.explain.user=true (default is false) , if the following query is sent, the user can see a much more clearly readable tree of operations.  

Code Block
EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31'
Code Block
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 2
         File Output Operator [FS_8]
            compressed:false
            Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
            Group By Operator [GBY_6]
            |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
            |  outputColumnNames:["_col0","_col1"]
            |  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
            |<-Map 1 [SIMPLE_EDGE]
               Reduce Output Operator [RS_5]
                  sort order:
                  Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                  value expressions:_col0 (type: bigint), _col1 (type: bigint)
                  Group By Operator [GBY_4]
                     aggregations:["sum(_col0)","sum(_col1)"]
                     outputColumnNames:["_col0","_col1"]
                     Statistics:Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE
                     Select Operator [SEL_2]
                        outputColumnNames:["_col0","_col1"]
                        Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE
                        TableScan [TS_0]
                           alias:src_orc_merge_test_part
                           Statistics:Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE