Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits to "Seeing whether vectorization is used for a query"

...

The following data types are currently supported for vectorized execution:

  • tinyint

...

  • smallint

...

  • int

...

  • bigint

...

  • boolean

...

  • float

...

  • double

...

  • decimal

...

timestamp, and string. Using other data types will cause your query to execute using standard, row-at-a-time execution.

...

You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this query, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true" notation:

Code Block
sql
sql
select csmallint
from alltypesorc
where csmallint > 0;
create table vectorizedtable(state string,id int) stored as orc;

insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable;

The the explain output contains this:

Code Block
text
text
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        alltypesorc
          TableScan
            alias: alltypesorcvectorizedtable
             Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
            FilterSelect Operator
              Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column predicatestats: COMPLETE
              Group By Operator
  expr              aggregations: (csmallint > 0)
  count()
                mode: hash
                typeoutputColumnNames: boolean_col0
                 Vectorized execution: true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                SelectReduce Output Operator
                  sort expressionsorder: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column exprstats: csmallintCOMPLETE
                  value expressions: _col0  (type: smallintbigint)
      Execution mode: vectorized
      Reduce Operator outputColumnNamesTree:
   _col0
     Group By Operator
         Vectorized executionaggregations: truecount(VALUE._col0)
          mode: mergepartial
       File   OutputoutputColumnNames: Operator_col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: falseCOMPLETE
          File Output Operator
         GlobalTableId   compressed: 0false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE tableColumn stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      VectorizedProcessor executionTree: true
        ListSink

The notation Vectorized execution: true shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.

Note: In case you want to use vectorized execution for fetch then 

set hive.fetch.task.conversion=none

Limitations

  • Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see HIVE-9862.

Version Information

Vectorized execution is available in Hive 0.13.0 and later (HIVE-5283).