Spark叢集基於Zookeeper的HA搭建部署
官方文件
下載地址http://flume.apache.org/download.html
hadoop HA安裝步驟
http://blog.csdn.net/haoxiaoyan/article/details/52623393
zookeeper安裝步驟
http://blog.csdn.net/haoxiaoyan/article/details/52523866
安裝Scala(在master上)
[[email protected] hadoop]# tar -xvzf scala-2.11.8.tgz
[[email protected] hadoop]# mv scala-2.11.8 scala
1. 設定環境變數
[[email protected] hadoop]# vi /etc/profile
#set scala
export SCALA_HOME=/opt/hadoop/scala
export PATH=$PATH:$SCALA_HOME/bin
"/etc/profile" 96L, 2426C written
[[email protected] hadoop]# source /etc/profile
[[email protected] hadoop]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
2. 將/opt/hadoop/scala從master複製到另外8臺機器1上。
[[email protected] hadoop]$ for i in {31,32,33,34,35,36,37,38,39};doscp -r scala [email protected]1.2$i:/opt/hadoop/ ; done
3. 將環境變數檔案複製到其他機器上
[[email protected] hadoop]$ for i in {31,32,33,34,35,36,37,38,39};doscp ~/.bash_profile [email protected]
4. 安裝spark
① 解壓軟體
[[email protected]hadoop]# tar -xzvf spark-1.6.1-bin-hadoop2.6.tgz
[[email protected] hadoop]# mv spark-1.6.1-bin-hadoop2.6 spark
[[email protected] conf]# pwd
/opt/hadoop/spark/conf
[[email protected] conf]# cp spark-env.sh.template spark-env.sh
[[email protected] conf]# cp spark-defaults.conf.template spark-defaults.conf
② 配置spark-env.sh
vi spark-env.sh
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export SCALA_HOME=/opt/hadoop/scala
export JAVA_HOME=/usr/java/jdk1.7.0_79
###the max memory size of worker
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_WORK_DIR=/opt/hadoop/spark/work
export SPARK_LOCAL_DIRS=/opt/hadoop/spark/tmp
export SPARK_DAEMON_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 -Dspark.deploy.zookeeper.dir=/spark"
③ 修改slaves檔案
[[email protected] conf]# cp slaves.template slaves
vi slaves
新增如下
slavenode1
slavenode2
slavenode3
slavenode4
slavenode5
slavenode6
slavenode7
slavenode8
④ 修改spark-defaults.conf
vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://cluster-ha/spark/logs
[[email protected]1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark
[[email protected]1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark/logs
⑤ 拷貝spark目錄和環境變數設定檔案到其他機器上
[[email protected] conf]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r spark 192.168.231.2$i:/opt/hadoop/ ; done
[[email protected] hadoop]#for i in {31,32,33,34,35,36,37,38,39};do scp ~/.bash_profile [email protected]192.168.231.2$i:~/.bash_profile ; done
⑥ 啟動spark
[[email protected] hadoop]# cd spark
[[email protected] spark]# cd sbin/
[[email protected] sbin]$ ./start-all.sh
[[email protected] sbin]$ jps
9836 RunJar
3428 DFSZKFailoverController
2693 NameNode
5851 RunJar
3635 ResourceManager
14061 Jps
13859 Master
3924 HMaster
進到masternode2(192.168.237.231)節點,把start-master.sh 啟動,當masternode1(192.168.237.230)掛掉時,masternode2頂替當master
[[email protected] sbin]$ ./start-master.sh
[[email protected] sbin]$ jps
11867 Master
3704 DFSZKFailoverController
2683 NameNode
3923 ResourceManager
12064 Jps
4500 HMaster
[[email protected] bin]# ./spark-shell
SQL context available as sqlContext.
scala>
出現scala>時說明成功。
⑦ 測試HA是否生效
(1)先檢視一下兩個節點的執行情況,現在masternode1運行了master,masternode2是待命狀態
進入Spark的Web管理頁面:
(2)在masternode1上把master服務停掉
[[email protected] sbin]$ ./stop-master.sh
用瀏覽器訪問masternode1的8080埠,看是否還活著。以下可以看出,master已經掛掉
再用瀏覽器訪問檢視masternode2的狀態,從下圖看出,masternode2已經被切換當master了
注意如果8080埠被其他程式佔用,spark會自動往上加一
採用叢集方式啟動spark-shell
[[email protected] sbin]$ spark-shell --master spark://masternode1:7077 &
http://spark.apache.org/docs/latest/configuration.html