...
Follow instructions here: https://spark.apache.org/docs/latest/spark-standalone.html. Make sure the following steps are done In particular:
- Install spark (either download pre-built spark, or build assembly from source). Note that Spark has different distributions for different versions of Hadoop. Keep note of the spark-assembly-*.jar location on the node Hive will run from.
- Start Spark cluster (Master and workers). Keep note of the Spark master URL. This can be found in Spark master WebUI.
...
- As of now, only hive/spark branch works against spark: https://github.com/apache/hive/tree/spark. Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
Start hive by adding and add the spark-assembly.jar to the hive auxpath.
Code Block hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar
Configure hive execution engine to run on spark:
Code Block hive> set hive.execution.engine=spark;
Configure required spark properties. Guide is at: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding spark-defaults.conf to the hive classpath, or as regular hive properties:
Code Block hive> set spark.master=<spark master URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
...