Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. As Hive on Spark is still in development, currently only a Hive assembly built from the Hive/Spark development branch supports Spark execution.  The development branch is located here: https://github.com/apache/hive/tree/spark.  Checkout the branch and build the Hive assembly as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
  2. If you download Spark, make sure you use a 1.2.x assembly: http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.1.2.jar
  3. There are several ways to add the spark dependency to Hive:

    1. Set the property 'spark.home' to point to the spark installation:

      Code Block
      hive> set spark.home=/location/to/spark
    Start Hive with <spark
    1. -assembly-*.
    jar>
    1. jar;
    2. Set the spark-assembly jar on the Hive auxpath:

      Code Block
      hive --auxpath /location/to/spark-assembly-*.jar
    3. Add the spark-assembly jar for current user session:

      Code Block
      hive> add jar /location/to/spark-assembly-*.jar;
  4. Configure Hive execution to Spark:

    Code Block
    hive> set hive.execution.engine=spark;
  5. Configure Spark-application configs for Hive.  See: http://spark.apache.org/docs/latest/configuration.html.  This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration:

    Code Block
    hive> set spark.master=<Spark Master URL>
    
    hive> set spark.eventLog.enabled=true;             
    
    hive> set spark.executor.memory=512m;              
    
    hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Common Issues (Green are resolved, will be removed from this list)

 

Alternatives until this is fixed:

  • Remove Guava 11 from HADOOP_HOME and replace it with Guava 14.
  • Choose to build Spark assembly manually, apply HIVE-7387-spark.patch to Spark branch before building.
    IssueCauseResolution

    java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt

    (I)Lcom/google/common/hash/HashCode

    Guava library version conflict between Spark and Hadoop.  See HIVE-7387 and SPARK-2420 for details.
    Error: Could not find or load main class org.apache.spark.deploy.SparkSubmitSpark dependency not correctly setAdd spark dependency to Hive, see Step3 above.

    org.apache.spark.SparkException: Job aborted due to stage failure:

    Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable

    Spark serializer not set to Kryo.Set spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above.

    java.lang.NullPointerException

    at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:257)

    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:224)

    Hive is included in the Spark Assembly.Either build a version of Spark without the "hive" profile or unjar the Spark assembly and rm -rf org/apache/hive org/apache/hadoop/hive and then rejar. The fix is in SPARK-2741, see Step 5 above.

    [ERROR] Terminal initialization failed; falling back to unsupported
    java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

    Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib
    1. Delete jline from the hadoop lib directory (it's only pulled in transitively from zk)
    2. export HADOOP_USER_CLASSPATH_FIRST=true

    java.lang.SecurityException: class "javax.servlet.DispatcherType"'s
    signer information does not match signer information of other classes
    in the same package
    at java.lang.ClassLoader.checkCerts(ClassLoader.java:952)

    Two versions of the servlet-api are in the classpath.
    1. This should be fixed by HIVE-8905
    2. Remove the servlet-api-2.5.jar under hive/lib.