...
Issue | Cause | Resolution |
---|---|---|
Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit | Spark dependency not correctly set. | Add Spark dependency to Hive, see Step 3 above. |
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set spark.serializer to be org.apache.spark.serializer.KryoSerializer, see Step 5 above. |
[ERROR] Terminal initialization failed; falling back to unsupported | Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib. |
|
java.lang.SecurityException: class "javax.servlet.DispatcherType"'s | Two versions of the servlet-api are in the classpath. |
|
Spark executor gets killed all the time and Spark keeps retrying the failed stage; you may find similar information in the YARN nodemanager log. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=217989,containerID=container_1421717252700_0716_01_50767235] is running beyond physical memory limits. Current usage: 43.1 GB of 43 GB physical memory used; 43.9 GB of 90.3 GB virtual memory used. Killing container. | For Spark on YARN, nodemanager would kill Spark executor if it used more memory than the configured size of "spark.executor.memory" + "spark.yarn.executor.memoryOverhead". | Increase "spark.yarn.executor.memoryOverhead" to make sure it covers the executor off-heap memory usage. |
Run query and get an error like: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
In hive logs, it shows: java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) | Happens if doing testing on Mac machines, this general Mac Snappy issueon Mac (not officially supported). This is a general Snappy issue with Mac and is not unique to sparkHive on Spark, but itworkaround is noted here as is needed for startup of Spark client. | Run Run this command before starting Hive or HiveServer2: export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS" |
...