Ambari在不升級情況下整合高版本spark2.x框架
一.背景介紹
大家都知道spark在2.x之後實現了一系列更方便快捷的改進,spark目前社群已經更新到了2.3.1版本,筆者發現spark在2.2.x版本之後,對於spark-ml的功能有顯著增強。而筆者用的大資料實驗環境是基於Ambari搭建的,版本為2.2.2,對應採用的HDP的版本為2.4.x。筆者杯具的發現,這個版本對於spark2.x還不能完全支援,自然就無法使用spark2.x以後帶來的各方面功能提升了。那麼問題來了,該如何將spark高版本的框架整合進ambari呢?
二.整合方案
1.方案1:ambari與HDP雙向升級
一種方法自然是升級ambari本身的版本到2.4後(最新已經到2.6.x版本),這種大升級情況需要謹慎操作,建議完全參照hortonworks document上面的官方文件進行,否在可能導致叢集崩盤。同時由於依賴需要科學上網,所以建議建立本地repos源進行升級操作。部分版本過低的使用者可能需要二連跳操作(先升級到2.2,再升級到更高版本)。整體升級的方案略提一下,不作為重點介紹,下面介紹一下更為安全的在不升級的情況下整合ambari與spark2.x框架。
2.方案2:ambari與spark2.x不升級整合
1.spark版本根據ambari叢集的hadoop版本重新編譯後,解壓到叢集的某節點上(一個或者多個,無需太多,根據實際提交任務的節點約束)。筆者自己編譯的版本為spark-2.2.0-bin-hadoop2.7。
2.修改spark-env.sh檔案,加入如下內容:注意HDP版本的HADOOP_CONF_DIR區別與apache版本以及CDH版本,為對應版本號下面的/hadoop/conf資料夾,而非/etc/hadoop
*JAVA_HOME export JAVA_HOME=/usr/local/app/jdk1.8.0_144 *HADOOP_CONF_DIR export HADOOP_CONF_DIR=/usr/hdp/2.4.2.0-258/hadoop/conf
3.分別拷貝core-site.xml與hdfs-site.xml到spark的conf目錄下,如果需要整合spark與已有hive或者將hive作為外部資料來源使用spark-sql,還需將hive-site.xml檔案拷貝進conf目錄。
4.解決jersey包版本衝突問題:叢集hadoop-yarn的jersey依賴包版本低於spark的對應的包版本,所以這裡必須將spark的相關依賴包替換成hadoop對應版本的。具體為將/usr/hdp/2.4.2.0-258/hadoop-yarn/lib目錄下的jersey-core-1.9.jar、jersey-client-1.9.jar這兩個依賴包拷貝到spark的lib目錄下,同時將spark的lib目錄下對應的高版本包去掉,修改後的spark的lib目錄如下圖:
5.經過以上四步,spark已經可以在ambari上面以local、standalone、yarn的client模式提交執行任務,但是以yarn的cluster mode執行任務時,會出現如下報錯:
18/07/21 20:55:40 INFO yarn.Client: Application report for application_1531149311507_0180 (state: ACCEPTED)
18/07/21 20:55:41 INFO yarn.Client: Application report for application_1531149311507_0180 (state: ACCEPTED)
18/07/21 20:55:42 INFO yarn.Client: Application report for application_1531149311507_0180 (state: ACCEPTED)
18/07/21 20:55:43 INFO yarn.Client: Application report for application_1531149311507_0180 (state: FAILED)
18/07/21 20:55:43 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1531149311507_0180 failed 2 times due to AM Container for appattempt_1531149311507_0180_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://bdp03.szmg.com.cn:8088/cluster/app/application_1531149311507_0180Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e22_1531149311507_0180_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/root/appcache/application_1531149311507_0180/container_e22_1531149311507_0180_02_000001/launch_container.sh: line 22: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/root/appcache/application_1531149311507_0180/container_e22_1531149311507_0180_02_000001/launch_container.sh: line 22: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
檢視yarn日誌,定位到問題出現在如下語句:找不到類
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
解決辦法:
1)在spark的conf檔案下spark-default.xml 增加對於HDP版本的支援
spark.driver.extraJavaOptions -Dhdp.version=2.4.2.0-258
spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.2.0-258
注意這裡的版本號2.4.2.0-258就是/usr/hdp下面的對應資料夾名,而非執行hadoop version的對應hadoop版本號!
2)在HADOOP_CONF_DIR下面的mapred-site.xml 資料夾中使用當前版本替換hdp.version資訊,使用sid批量替換:
sed -i 's/${hdp.version}/2.4.2.0-258/g' mapred-site.xml
三.總結
編寫一個wordcount程式測試spark on yarn cluster模式:
/usr/local/app/spark-2.2.0-bin-hadoop2.7/bin/spark-submit \
--class com.szmg.SparkWordCount \
--master yarn \
--deploy-mode cluster \
--driver-memory 2G \
--num-executors 4 \
--executor-memory 4G \
--executor-cores 5 \
--conf spark.app.coalesce=1 \
/usr/local/app/spark_test_projects/word_count/jar/scalaProject.jar \
/testdata/README.md \
/testdata/output2
提交到yarn上面執行:成功!
至此,實現了ambari與任意版本高版本spark的相容,筆者測試了spark2.2.x與spark2.3.x,都成功實現了spark on yarn cluster的模式,從而實現了ambari叢集對於不同版本spark的支援。