...

The following data types are currently supported for vectorized execution:

tinyint

...

smallint

...

int

...

bigint

...

boolean

...

float

...

double

...

decimal

...

date
timestamp (see Limitations below)
string

, timestamp, and string. Using other data types will cause your query to execute using standard, row-at-a-time execution.

...

You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this query, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true" notation:

Code Block

	sql
	sql

select csmallint
from alltypesorc
where csmallint > 0;

create table vectorizedtable(state string,id int) stored as orc;

insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable;

The the explain output contains this:

Code Block

	text
	text

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        alltypesorc
          TableScan
            alias: alltypesorcvectorizedtable
             Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
            FilterSelect Operator
              Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column predicatestats: COMPLETE
              Group By Operator
  expr              aggregations: (csmallint > 0)
  count()
                mode: hash
                typeoutputColumnNames: boolean_col0
                 Vectorized execution: true
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                SelectReduce Output Operator
                  sort expressionsorder: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column exprstats: csmallintCOMPLETE
                  value expressions: _col0  (type: smallintbigint)
      Execution mode: vectorized
      Reduce Operator outputColumnNamesTree:
   _col0
     Group By Operator
         Vectorized executionaggregations: truecount(VALUE._col0)
          mode: mergepartial
       File   OutputoutputColumnNames: Operator_col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: falseCOMPLETE
          File Output Operator
         GlobalTableId   compressed: 0false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE tableColumn stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      VectorizedProcessor executionTree: true
        ListSink

The notation Vectorized execution: true shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.

Note: In case you want to use vectorized execution for fetch then

set hive.fetch.task.conversion=none

Limitations

Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see HIVE-9862.

Version Information

Vectorized execution is available in Hive 0.13.0 and later (HIVE-5283).

Space shortcuts

Child pages

Versions Compared

Old Version 44

New Version Current

Key

Limitations

Version Information

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 44

New Version Current

Key

Limitations

Version Information