Spark各種問題集錦[持續更新]

阿新 • • 發佈：2019-02-09

1、Initial job has not accepted any resources

16/08/13 17:05:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/08/13 17:05:57 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

這個資訊就是告訴我們，初始化的作業不能接受到任何資源，spark只會尋找兩件資源：Cores和Memory。所以，出現這個資訊，肯定是這兩種資源不夠，我們可以開啟Spark UI介面看看情況：
這裡寫圖片描述

從圖中可以發現，cores已經被用完了，也就是有其他任務正在佔用這些資源，也或者是spark-shell，所以，才會出現上述警告資訊。

2、Exception in thread “main” java.lang.ClassNotFoundException

Exception in thread "main" java.lang.ClassNotFoundException: Main
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader 
.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
    at org.apache.spark.deploy.worker.DriverWrapper 
$.main(DriverWrapper.scala:56)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

當我們在提交spark-submit的時候，經常會遇到這個異常，但導致這個異常的原因真的很多，比如，在你的JAR包中，真的沒有這個類，這個異常與其他找不到類的異常有個區別，區別在於，這裡找不到類，是找不到主類，而不是找不到其他引用的類，如果找不到其他引用的類的話，很可能是類路徑有問題，或沒引入相應的類庫，這裡是沒有找到主類，當時我也很奇怪，同樣在一個JAR裡，為什麼有的主類可以找到，有些主類無法找到，後面發現當我用package把那個主類放在某個包下面時，這個主類就無法找到了，然後把這個主類放到原始碼的根目錄下，就能找到，所以，主類找不到的解決方法可以試試把主類放到原始碼的根目錄下，至少，我的情況是這樣的，然後成功解決了，畢竟，每個人遇到的情況不一樣，所以，good luck to you！

解決方法：
把主類放到原始碼的根目錄，即src下。

3、When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment

[email protected]:~$ ./shell/spark-submit.sh 
16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/03 10:35:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /jar/edu-cloud-assembly-1.0.jar
16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
    at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
    at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
    at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

解決方法：
編輯$SPARK_HOME/conf/spark-env.sh檔案

hadoop@master:~$ vi spark-1.6.0-bin-hadoop2.4/conf/spark-env.sh

加入以下行：

HADOOP_CONF_DIR=/home/hadoop/hadoop-2.4.0/etc/hadoop/

然後，將叢集上的這個檔案都更新。

4、awaitResult Exception

Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult

問題原因：

解決方法：
將預設的配置調大，預設為300s，具體如下：

spark.conf.set("spark.sql.broadcastTimeout", 1200)

5、Exception in thread “main” org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true

18/01/09 20:25:33 INFO FileSourceStrategy: Planning scan with bin packing, max size: 134217728 bytes, open cost is considered as scanning 4194304 bytes.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true;
    at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.doPrepare(BroadcastNestedLoopJoinExec.scala:345)
    at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
	at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:240)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:323)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
	at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2193)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
    at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2546)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2192)
	at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2197)
    at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2197)
    at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2559)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2197)
	at org.apache.spark.sql.Dataset.collect(Dataset.scala:2173)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/09 20:25:34 INFO SparkContext: Invoking stop() from shutdown hook

解決方法：

set spark.sql.crossJoin.enabled = true;

Spark各種問題集錦[持續更新]

1、Initial job has not accepted any resources

2、Exception in thread “main” java.lang.ClassNotFoundException

3、When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment

4、awaitResult Exception

5、Exception in thread “main” org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true

Spark各種問題集錦[持續更新]

自學it18大數據筆記-第三階段Spark-day03——會持續更新……

自學it18大數據筆記-第三階段Spark-day07——會持續更新……

自學it18大數據筆記-第三階段Spark-day11——會持續更新……

JavaEE進階集錦(持續更新中)

Python全棧學習_作業集錦(持續更新)

Spark知識點總結--持續更新

matlab 的程式錯誤集錦 (持續更新)

java/golang開發面試中遇到的問題集錦--持續更新

編譯Caffe-Win錯誤集錦(持續更新)

升級到XCode9（BaseSDK:iOS11）的各種坑持續更新中

spark dataframe實戰(持續更新)

自學it18大數據筆記-第三階段Spark-day14；Spark-day15（開始試水找工作了）——會持續更新……

webpack漸入佳境系列一：webpack環境配置與打包基礎【附帶各種 "坑" 與解決方案！持續更新中...】

SQL數據庫各種查詢建表插入集合-待續持續更新

需要註意的各種各種持續更新？

Zabbix各種報警信息（持續更新）

Python·Jupyter Notebook各種使用方法記錄·持續更新

多項式的各種運算總結(持續更新)

Spark 異常匯總（持續更新）

Spark各種問題集錦[持續更新]

1、Initial job has not accepted any resources

2、Exception in thread “main” java.lang.ClassNotFoundException

3、When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment

4、awaitResult Exception

5、Exception in thread “main” org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true

相關推薦