Page History

...

As Hive on Spark is still in development, currently only a Hive assembly built from the Hive/Spark development branch supports Spark execution. The development branch is located here: https://github.com/apache/hive/tree/spark. Checkout the branch and build the Hive assembly as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
If you download Spark, make sure you use a 1.2.x assembly: http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.1.2.jar
There are several ways to add the spark dependency to Hive:
1. Set the property 'spark.home' to point to the spark installation:
  Code Block
  hive> set spark.home=/location/to/spark
Start Hive with <spark
1. -assembly-*.
jar>
1. jar;
2. Set the spark-assembly jar on the Hive auxpath:
  Code Block
  hive --auxpath /location/to/spark-assembly-*.jar
3. Add the spark-assembly jar for current user session:
  Code Block
  hive> add jar /location/to/spark-assembly-*.jar;
Configure Hive execution to Spark:
Code Block
hive> set hive.execution.engine=spark;

Configure Spark-application configs for Hive. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration:

Code Block
hive> set spark.master=<Spark Master URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Common Issues (Green are resolved, will be removed from this list)

Alternatives until this is fixed:

Remove Guava 11 from HADOOP_HOME and replace it with Guava 14.

Choose to build Spark assembly manually, apply HIVE-7387-spark.patch to Spark branch before building.

Issue	Cause	Resolution	java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt (I)Lcom/google/common/hash/HashCode	Guava library version conflict between Spark and Hadoop. See HIVE-7387 and SPARK-2420 for details.
Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit	Spark dependency not correctly set	Add spark dependency to Hive, see Step3 above.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable	Spark serializer not set to Kryo.	Set spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above.	java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:257) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:224)	Hive is included in the Spark Assembly.	Either build a version of Spark without the "hive" profile or unjar the Spark assembly and rm -rf org/apache/hive org/apache/hadoop/hive and then rejar. The fix is in SPARK-2741, see Step 5 above.
[ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected	Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib	Delete jline from the hadoop lib directory (it's only pulled in transitively from zk) export HADOOP_USER_CLASSPATH_FIRST=true
java.lang.SecurityException: class "javax.servlet.DispatcherType"'s signer information does not match signer information of other classes in the same package at java.lang.ClassLoader.checkCerts(ClassLoader.java:952)	Two versions of the servlet-api are in the classpath.	This should be fixed by HIVE-8905 Remove the servlet-api-2.5.jar under hive/lib.

Space shortcuts

Child pages

Versions Compared

Old Version 29

New Version 30

Key

Common Issues (Green are resolved, will be removed from this list)