...
To use vectorized query execution, you must store your data in ORC format, and set the following variable as shown in Hive SQL (see Configuring Hive):
set hive.vectorized.execution.enabled = true;
...
set hive.vectorized.execution.enabled = false;
Additional configuration variables for vectorized execution are documented in Configuration Properties – Vectorization.
Supported data types and operations
The following data types are currently supported for vectorized execution:
tinyint
...
smallint
...
int
...
bigint
...
boolean
...
float
double
decimal
date
timestamp
(see Limitations below)string
, double
, timestamp
, and string
. Using other data types will cause your query to execute using standard, row-at-a-time execution.
...
Using a built-in operator or function that is not supported for vectorization will cause your query to run in standard row-at-a-time mode. If a compile time or run time error occurs that appears related to vectorization, please file a Hive JIRA. To work around such an error, disable vectorization by setting hive.vectorized.execution.enabled
to false
for the specific query that is failing, to run it in standard mode.
...
You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this query, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true
" notation:
Code Block | ||||
---|---|---|---|---|
| ||||
select csmallint
from alltypesorc
where csmallint > 0;
| ||||
create table vectorizedtable(state string,id int) stored as orc;
insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable; |
The the explain output contains this:
Code Block | ||||
---|---|---|---|---|
| ||||
STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: alltypesorc TableScan alias: alltypesorc: vectorizedtable Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE FilterSelect Operator Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column predicatestats: COMPLETE Group By Operator expraggregations: (csmallint > 0) count() mode: hash typeoutputColumnNames: boolean _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE VectorizedColumn executionstats: trueCOMPLETE Reduce SelectOutput Operator expressionssort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column exprstats: csmallintCOMPLETE value expressions: _col0 (type: smallintbigint) Execution mode: vectorized Reduce Operator outputColumnNamesTree: _col0 Group By Operator Vectorized executionaggregations: true count(VALUE._col0) mode: mergepartial File OutputoutputColumnNames: Operator_col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: false COMPLETE File Output Operator GlobalTableIdcompressed: 0false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column tablestats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 VectorizedProcessor execution: trueTree: ListSink |
The notation Vectorized execution: true
shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.
Note: In case you want to use vectorized execution for fetch then
set hive.fetch.task.conversion=none
Limitations
- Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see HIVE-9862.
Version Information
Vectorized execution is expected to be available in Hive 0.13 and later. The feature is currently in the Hive trunk branch.0 and later (HIVE-5283).