Query reexecution provides a facility to re-run the query multiple times in case of an unfortunate event happens.

Introduced in Hive 3.0 (HIVE-17626)

ReExecition strategies

Overlay

Enables to change the hive settings for all reexecutions which will be happening. It works by adding a configuration subtree as an overlay to the actual hive settings(reexec.overlay.*)

Example

set zzz=1;
set reexec.overlay.zzz=2;

set hive.query.reexecution.enabled=true;
set hive.query.reexecution.strategies=overlay;

create table t(a int);
insert into t values (1);
select assert_true(${hiveconf:zzz} > a) from t group by a;

Every hive setting which has a prefix of "reexec.overlay" will be set for all reexecutions.

A more real life example would be to disable join auto conversion for all reexecutions:

set reexec.overlay.hive.auto.convert.join=false;

Reoptimize

During query execution; the actual number passing rows in every operator is tracked. This information is reused during re-planning which could result in a better plan.

Situation in which this would be needed:

missing statististics
incorrect statistics
many joins

It's not that easy to craft queries which will lead to OOM situations; but to enable it:

set hive.query.reexecution.strategies=overlay,reoptimize;

Operator Matching

Operator level statistics are matched to the new plan using operator subtree matching this also enables to match the information to a query which have "similar" parts.

Configuration

Configuration	default
hive.query.reexecution.enabled	true	Feature enabler
hive.query.reexecution.strategies	overlay,reoptimize	reexecution plugins; currently overlay and reoptimize is supported
hive.query.reexecution.stats.persist.scope	query	runtime statistics can be persisted: query: - only used during the reexecution hiveserver: persisted during the lifetime of the hiveserver metastore: persisted in the metastore; and loaded on hiveserver startu[
hive.query.reexecution.max.count	1	number of reexecution that may happen
hive.query.reexecution.always.collect.operator.stats	false	Enable to gather runtime statistics on all queries.
hive.query.reexecution.stats.cache.batch.size	-1	If runtime stats are stored in metastore; the maximal batch size per round during load.
hive.query.reexecution.stats.cache.size	100 000	Size of the runtime statistics cache. Unit is: OperatorStat entry; a query plan consist ~100.
runtime.stats.clean.frequency	3600s	Frequency at which timer task runs to remove outdated runtime stat entries.
runtime.stats.max.age	3days	Stat entries which are older than this are removed.

Space shortcuts

Child pages

ReExecition strategies

Overlay

Reoptimize

Operator Matching

Configuration

Space shortcuts

Child pages

Query ReExecution

ReExecition strategies

Overlay

Reoptimize

Operator Matching

Configuration