Hive on Spark: Getting Started

Spark Installation

Follow instructions to install latest spark: https://spark.apache.org/docs/latest/spark-standalone.html. In particular:

Install spark (either download pre-built spark, or build assembly from source). Note that Spark has distributions for different versions of Hadoop. Keep note of the spark-assembly-*.jar location.
Start Spark cluster (Master and workers). Keep note of the Spark master URL. This can be found in Spark master WebUI.

Configuration Hive

As of now, only hive/spark branch works against spark: https://github.com/apache/hive/tree/spark. Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.

Start hive and add the spark-assembly.jar to the hive auxpath.

hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar

Configure hive execution engine to run on spark:
```
hive> set hive.execution.engine=spark;
```

Configure required spark properties. Guide is at: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding spark-defaults.conf to the hive classpath, or as regular hive properties:

hive> set spark.master=<spark master URL>

hive> set spark.eventLog.enabled=true;             

hive> set spark.executor.memory=512m;              

hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Known Issues

Issue	Cause	Resolution
java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode	Guava library version conflict between Spark and Hadoop. See HIVE-7387 and SPARK-2420 for details.	Temporarily remove guava jars from HADOOP_HOME.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable	Spark serializer not set to kryo	Set spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above

Space shortcuts

Child pages

Hive on Spark: Getting Started

Spark Installation

Configuration Hive

Known Issues