Page History

...

As Hive on Spark is still in development, currently only a Hive assembly built from the Hive/Spark development branch supports Spark execution. The development branch is located here: https://github.com/apache/hive/tree/spark. Checkout the branch and build the Hive assembly as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
If you download Spark, make sure you use a 1.2.x assembly: http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.1.2.jar
There are several ways to add the Spark dependency to Hive:
1. Set the property 'spark.home' to point to the Spark installation:
  Code Block
  hive> set spark.home=/location/to/sparkHome;
2. Define the SPARK_HOME environment variable before starting Hive CLI/HS2
  Code Block
  export SPARK_HOME=/usr/lib/spark....
3. Set the spark-assembly jar on the Hive auxpath:
  Code Block
  hive --auxpath /location/to/spark-assembly-*.jar
4. Add the spark-assembly jar for the current user session:
  Code Block
  hive> add jar /location/to/spark-assembly-*.jar;
5. Link the spark-assembly jar to HIVE_HOME/lib.
Please note b and c are not recommended because they cause Spark to ship the spark-assembly jar to each executor when you run queries.
Configure Hive execution to Spark:
Code Block
hive> set hive.execution.engine=spark;

Configure Spark-application configs for Hive. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration:

Code Block
hive> set spark.master=<Spark Master URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

...

Space shortcuts

Child pages

Versions Compared

Old Version 52

New Version 53

Key