...
- Install spark (either download pre-built spark, or build assembly from source).
- Download the correct Install/build a compatible version. To find out what version of Spark that your particular Hive build was built/tested on, check your Hive's root pom.xml for <spark.version>.
- Note: Each Install/build a compatible distribution. Each version of Spark in turn has several distributions, corresponding with different versions of Hadoop. Choose the one corresponding to Hadoop installation.
- Once spark is installed, find and keep note of the spark<spark-assembly-*.jar jar> location.
- Start Spark cluster (Master and workers).
- Keep note of the Spark master URL<Spark Master URL>. This can be found in Spark master WebUI.
...
- As Hive on Spark is still in development, only a Hive assembly built from hive/spark development branch works against spark: https://github.com/apache/hive/tree/spark. Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
Start hive and add the spark<spark-assembly-*.jar jar> to the hive auxpath.
Code Block hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar
Configure hive execution engine to run on spark:
Code Block hive> set hive.execution.engine=spark;
Configure required properties for spark-conf. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file "spark-defaults.conf" to the hive classpath, or configured as normal properties from hive.
Code Block hive> set spark.master=<spark<Spark masterMaster URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
...