...

tinyint
smallint
int
bigint
boolean
float
double
decimal
date
timestamp (See limitations see Limitations below)
string

Using other data types will cause your query to execute using standard, row-at-a-time execution.

...

You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this querywhen Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true" notation:

Code Block

	sql
	sql

select csmallint
from alltypesorc
where csmallint > 0;

create table vectorizedtable(state string,id int) stored as orc;

insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable;

The the explain output contains this:

Code Block

	text
	text

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        alltypesorc
          TableScan
            alias: alltypesorc: vectorizedtable
             Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
            FilterSelect Operator
              predicate:
Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
              Group By Operator
     expr           aggregations: (csmallint > 0)
  count()
                mode: hash
                typeoutputColumnNames: boolean
_col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE VectorizedColumn executionstats: trueCOMPLETE
                SelectReduce Output Operator
                  sort expressionsorder: 
                  Statistics: Num rows:  expr: csmallint1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  value expressions: _col0  (type: smallintbigint)
      Execution mode: vectorized
      Reduce Operator outputColumnNamesTree:
 _col0
       Group By Operator
         Vectorized executionaggregations: true
count(VALUE._col0)
          mode: mergepartial
      File Output Operator
    outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: falseCOMPLETE
          File Output Operator
      GlobalTableId      compressed: 0false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE tableColumn stats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      VectorizedProcessor executionTree: true
        ListSink

The notation Vectorized execution: true shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.

Note: In case you want to use vectorized execution for fetch then

set hive.fetch.task.conversion=none

Limitations

Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 to and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see JiraserverASF JIRAkeysee HIVE-9862.

Version Information

Vectorized execution is available in Hive 0.13.0 and later (HIVE-5283).

Space shortcuts

Child pages

Versions Compared

Old Version 45

New Version Current

Key

Limitations

Version Information

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 45

New Version Current

Key

Limitations

Version Information