Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits to "Seeing whether vectorization is used for a query"

...

To use vectorized query execution, you must store your data in ORC format, and set the following variable as shown in Hive SQL (see Configuring Hive):

set hive.vectorized.execution.enabled = true;

...

set hive.vectorized.execution.enabled = false;

Additional configuration variables for vectorized execution are documented in Configuration Properties – Vectorization.

Supported data types and operations

The following data types are currently supported for vectorized execution:

  • tinyint

...

  • smallint

...

  • int

...

  • bigint

...

  • boolean

...

  • float
  • double
  • decimal
  • date
  • timestamp (see Limitations below)
  • string

, double, timestamp, and string. Using other data types will cause your query to execute using standard, row-at-a-time execution.

...

Using a built-in operator or function that is not supported for vectorization will cause your query to run in standard row-at-a-time mode. If a compile time or run time error occurs that appears related to vectorization, please file a Hive JIRA. To work around such an error, disable vectorization by setting hive.vectorized.execution.enabled to false for the specific query that is failing, to run it in standard mode.

...

You can verify which parts of your query are being vectorized using the explain feature. For example, with vectorization enabled and the table alltypesorc stored in ORC format, for this query, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "Vectorized execution: true" notation:

Code Block
sql
sql
select csmallint
from alltypesorc
where csmallint > 0;
create table vectorizedtable(state string,id int) stored as orc;

insert into vectorizedtable values('haryana',1);
set hive.vectorized.execution.enabled = true;
explain select count(*) from vectorizedtable;

The the explain output contains this:

Code Block
text
text
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        alltypesorc
          TableScan
            alias: alltypesorc: vectorizedtable
             Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column stats: COMPLETE
            FilterSelect Operator
              Statistics: Num rows: 1 Data size: 95 Basic stats: COMPLETE Column predicatestats: COMPLETE
              Group By Operator
                expraggregations: (csmallint > 0)
  count()
                mode: hash
                typeoutputColumnNames: boolean
_col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE VectorizedColumn executionstats: trueCOMPLETE
                Reduce SelectOutput Operator
                  expressionssort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column exprstats: csmallintCOMPLETE
                  value expressions: _col0  (type: smallintbigint)
      Execution mode: vectorized
      Reduce Operator outputColumnNamesTree:
 _col0
       Group By Operator
         Vectorized executionaggregations: true
count(VALUE._col0)
          mode: mergepartial
         File OutputoutputColumnNames: Operator_col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column compressedstats: false
COMPLETE
          File Output Operator
            GlobalTableIdcompressed: 0false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column tablestats: COMPLETE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Stage: Stage-0
    Fetch Operator
      limit: -1
      VectorizedProcessor execution: trueTree:
        ListSink

The notation Vectorized execution: true shows that the operator containing that notation is vectorized. Absence of this notation means the operator is not vectorized, and uses the standard row-at-a-time execution path.

Note: In case you want to use vectorized execution for fetch then 

set hive.fetch.task.conversion=none

Limitations

  • Timestamps only work correctly with vectorized execution if the timestamp value is between 1677-09-20 and 2262-04-11. This limitation is due to the fact that a vectorized timestamp value is stored as a long value representing nanoseconds before/after the Unix Epoch time of 1970-01-01 00:00:00 UTC. Also see HIVE-9862.

Version Information

Vectorized execution is expected to be available in Hive 0.13 and later. The feature is currently in the Hive trunk branch.0 and later (HIVE-5283).