Hive is used for both interactive queries as well as part. The Hive variable substitution mechanism was designed to avoid some of the code that was getting baked into the scripting language on top of Hive. Examples such as:
are becoming commonplace. This is frustrating as Hive becomes closely coupled with scripting languages. The Hive startup time of a couple seconds is non-trivial when doing thousands of manipulations such as multiple
hive -e invocations.
Hive Variables combine the set capability you know and love with some limited yet powerful (evil laugh) substitution ability. For example:
For general information about Hive command line options, see Hive CLI.
There are three namespaces for variables – hiveconf, system, and env. (Custom variables can also be created in a separate namespace with the
hivevar option in Hive 0.8.0 and later releases.)
The hiveconf variables are set as normal:
However they are retrieved using:
Annotated examples of usage from the test case ql/src/test/queries/clientpositive/set_processor_namespaces.q:
Substitution During Query Construction
Hive substitutes the value for a variable when a query is constructed with the variable. If you run two different Hive sessions, variable values will not be mixed across sessions. But if you set variables with the same name in the same Hive session, a query uses the last set value.
Disabling Variable Substitution
Variable substitution is on by default (hive.variable.substitute=true). If this causes an issue with an already existing script, disable it.