...
Issue | Cause | Resolution |
---|---|---|
Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit | Spark dependency not correctly set. | Add Spark dependency to Hive, see Step 3 above. |
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set spark.serializer to be org.apache.spark.serializer.KryoSerializer, see Step 5 above. |
[ERROR] Terminal initialization failed; falling back to unsupported | Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib. |
|
java.lang.SecurityException: class "javax.servlet.DispatcherType"'s | Two versions of the servlet-api are in the classpath. |
|
Spark executor get killed all the times and Spark keep retrying the failed stage, you may find the similar information in YARN nodemanager log. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=217989,containerID=container_1421717252700_0716_01_50767235] is running beyond physical memory limits. Current usage: 43.1 GB of 43 GB physical memory used; 43.9 GB of 90.3 GB virtual memory used. Killing container. | For Spark on YARN, nodemanager would kill spark executor if it use more memory than the configured size of "spark.executor.memory" + "spark.yarn.executor.memoryOverhead". | increase "spark.yarn.executor.memoryOverhead" to make sure it cover the executor off-heap memory usage. |
Recommended Configuration
...