Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits to "Seeing whether vectorization is used for a query"

...

You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this querywhen Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true" notation:

Code Block
sql
sql
select csmallint
from alltypesorc
where csmallint > 0;
create table vectorizedtable(state string,id int) stored as orc;

insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable;

The the explain output contains this:

Code Block
text
text
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        alltypesorc
          TableScan
            alias: alltypesorcvectorizedtable
             Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
            FilterSelect Operator
              predicate:Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
              expr  aggregations: (csmallint > 0)
  count()
                mode: hash
                typeoutputColumnNames: boolean_col0
                 Vectorized execution: true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Reduce SelectOutput Operator
                  sort expressionsorder: 
                  Statistics: Num rows: 1 Data size: 8  exprBasic stats: COMPLETE Column stats: csmallintCOMPLETE
                  value expressions: _col0  (type: smallintbigint)
      Execution mode: vectorized
      Reduce Operator outputColumnNamesTree:
 _col0
       Group By Operator
         Vectorized executionaggregations: true
count(VALUE._col0)
          mode: mergepartial
         File OutputoutputColumnNames: Operator_col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: false
COMPLETE
          File Output Operator
            GlobalTableIdcompressed: 0false
            Statistics: Num rows: 1 Data  table:size: 8 Basic stats: COMPLETE Column stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      VectorizedProcessor execution: trueTree:
        ListSink

The notation Vectorized execution: true shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.

Note: In case you want to use vectorized execution for fetch then 

set hive.fetch.task.conversion=none

Limitations

  • Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see HIVE-9862.

...