Spark效能測試WordCount負載-HiBench-執行報錯
背景
- Spark版本2.3.1,同樣適用於Spark2.2.x系列
- CentOS7 x86_64,JAVA1.8.0
- HiBench-master版(7.0)
步驟
-
下載編譯HiBench (maven 3.3.9):
mvn -Dspark=2.2 -Dscala=2.11 clean package
-
按照官網SparkBench配置各項,參考SparkBench配置。
-
執行生成資料指令碼,生成資料規模為
large
:bin/workloads/micro/wordcount/prepare/prepare.sh
-
執行Spark的wordcount工作負載:
bin/workloads/micro/wordcount/spark/run.sh
報錯
ERROR: Spark job com.intel.hibench.sparkbench.micro.ScalaWordCount failed to run successfully.
錯誤日誌
org.apache.spark.SparkException: Exception thrown in awaitResult:
at ……
Caused by: java.io.IOException: Failed to send RPC 7038938719505164344 to /hostname:port: java.nio.channels.ClosedChannelException
at ……
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
19/09/12 17:33:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
……
java.lang.IllegalStateException: Spark context stopped while waiting for backend
Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
……
複製程式碼
分析
日誌報錯資訊較多,不容易定位錯誤,容易發現Caused byjava.nio.channels.ClosedChannelException
,依照此線索查詢解決方案有二(不是本例的解決辦法):
其一,增大虛擬記憶體
虛擬記憶體的總量 = yarn.scheduler.minimum-allocation-mb * yarn.nodemanager.vmem-pmem-ratio . 如果需要的虛擬記憶體總量超過這個計算所得的數值,就會出現 Killing container.
vim yarn-site.xml
<property>
<name> yarn.scheduler.maximum-allocation-mb</name>
<value>8096</value>
<discription>每個任務最多可用記憶體,單位MB,預設8182MB</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
<discription>每個任務最少可用記憶體</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4.1</value>
<property>
複製程式碼
但如果這些配置已經是合理的(最大值或較大值),則本方法無效。
其二,關閉虛擬記憶體檢測(不推薦)
有點掩耳盜鈴吧 也是修改yarn-site.xml:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
複製程式碼
這兩個引數的意思是說是否啟動一個執行緒檢查每個任務正使用的實體記憶體量和虛擬記憶體量,如果任務超出分配值,則直接將其殺掉,預設是true。此處試了,沒有起作用,還是報錯。
解決方案
關鍵日誌
注意日誌裡INFO部分的提示資訊:
INFO ApplicationMaster: Final app status: FAILED,exitCode: 13,(reason: Uncaught exception:
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,requested resource type=[vcores] < 0 or greater than maximum allowed allocation. Requested resource=<memory:4505,vCores:4>,maximum allowed allocation=<memory:24576,vCores:3>,please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers,which might be less than configured maximum allocation=<memory:24576,vCores:3>
複製程式碼
注意Invalid resource request,可以看到是無效的資源請求。因為我使用的環境是虛擬機器器,配置不是很高,請求的虛擬core的數量超過了能分配的最大限制,因此報錯。之前看到的java.nio.channels.ClosedChannelException這個錯誤有迷惑性,不容易發現錯誤的原因。
解決辦法是針對這個HiBench任務,配置有效的資源請求。修改spark.conf,將請求的cores數量降低為2(預設的是4,而我的機器上設定單個Container最大vcores是3)。
vim /{HiBench-home}/conf/spark.conf
調整如下內容(酌情):
hibench.yarn.executor.num 4
hibench.yarn.executor.cores 2
複製程式碼
儲存後再次執行spark wordcount負載:
bin/workloads/micro/wordcount/spark/run.sh
start ScalaSparkWordcount bench
hdfs rm -r: …… -rm -r -skipTrash hdfs://hostname:8020/HiBench/xxx/Wordcount/Output
rm: `hdfs://hostname:8020/HiBench/xxx/Wordcount/Output': No such file or directory
hdfs du -s: ……
Export env: SPARKBENCH_PROPERTIES_FILES=……
Submit Spark job: /usr/hdp/xxx/spark2/bin/spark-submit ……
19/09/12 18:00:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-2bf5c456-70f1-4b7a-81c6-xxx
finish ScalaSparkWordcount bench
複製程式碼
ok,執行成功.
檢視報告
cat hibench.report
Type Date Time Input_data_size Duration(s) Throughput(bytes/s) Throughput/node
ScalaSparkWordcount 2019-09-11 17:00:03 3258327393 58.865 55352542 18450847
ScalaSparkWordcount 2019-09-12 18:00:32 3258311659 76.810 42420409 14140136
複製程式碼
其他spark工作負載出錯類似處理即可,有幫助的話求個贊!Thanks,有任何疑問可以留言交流。