spark on yarn詳解
1、參考文檔:
spark-1.3.0:http://spark.apache.org/docs/1.3.0/running-on-yarn.html
spark-1.6.0:http://spark.apache.org/docs/1.6.0/running-on-yarn.html
備註:從spark-1.6.0開始,spark on yarn命令有略微改變,具體參考官方文檔,這裏以spark 1.3.0集群為主。
2、前期準備
編譯spark,參看文檔:http://www.cnblogs.com/wcwen1990/p/7688027.html
spark安裝部署(包括local模式和standalone模式):http://www.cnblogs.com/wcwen1990/p/6889521.html
3、spark on yarn配置:
1)啟動hadoop集群:
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager
sbin/mr-jobhistory-daemon.sh start historyserver
2)啟動spark歷史日誌服務:
sbin/start-history-server.sh
3)查看進程信息:
$ jps
3182 DataNode
3734 JobHistoryServer
3949 Jps
3555 NodeManager
3295 ResourceManager
3857 HistoryServer
3094 NameNode
4、spark-submit方式提交應用到yarn(提交可以以client模式和cluster模式進行應用提交):
1)spark-1.3.0:
$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]
For example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
2)spark-1.6.0:
$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]
For example:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
5、spark-shell方式運行在yarn上(spark-shell只能通過client模式運行):
1)spark-1.3.0:
$ ./bin/spark-shell --master yarn-client
2)spark-1.6.0:
$ ./bin/spark-shell --master yarn --deploy-mode client
6、測試,這裏以spark-1.3.0為例:
$ ./bin/spark-shell --master yarn-client
spark on yarn模式下運行wordcount程序:
scala> sc.textFile("/user/hadoop/mapreduce/wordcount/input/wc.input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).map(x => (x._2,x._1)).sortByKey(false).map(x => (x._2,x._1)).collect
... ...
res0: Array[(String, Int)] = Array((scala,1), (hive,1), (oozie,1), (mapreduce,1), (zookeeper,1), (hue,1), (yarn,1), (sqoop,1), (kafka,1), (spark,1), (hadoop,1), (flume,1), (hdfs,1), (storm,1), (hbase,1))
scala> sc.stop
以上程序運行過程可以通過web ui查看詳情,具體地址有以下幾個:
yarn:http://chavin.king:8088
spark應用監控:http://chavin.king:4040
歷史日誌服務:http://chavin.king:18080
spark on yarn詳解