1. 程式人生 > >Spark叢集基於Zookeeper的HA搭建部署

Spark叢集基於Zookeeper的HA搭建部署

官方文件

下載地址http://flume.apache.org/download.html

hadoop HA安裝步驟

http://blog.csdn.net/haoxiaoyan/article/details/52623393

zookeeper安裝步驟

http://blog.csdn.net/haoxiaoyan/article/details/52523866

安裝Scala(在master上)

[[email protected] hadoop]# tar -xvzf scala-2.11.8.tgz 

[[email protected] hadoop]# mv scala-2.11.8 scala

1. 設定環境變數

[[email protected] hadoop]# vi /etc/profile

#set scala

export SCALA_HOME=/opt/hadoop/scala

export PATH=$PATH:$SCALA_HOME/bin

"/etc/profile" 96L, 2426C written

[[email protected] hadoop]# source /etc/profile

[[email protected] hadoop]# scala -version

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

2. 將/opt/hadoop/scala從master複製到另外8臺機器1上。 

[[email protected] hadoop]$ for i in {31,32,33,34,35,36,37,38,39};doscp -r scala  [email protected]1.2$i:/opt/hadoop/ ; done

3. 將環境變數檔案複製到其他機器上

[[email protected] hadoop]$ for i in {31,32,33,34,35,36,37,38,39};doscp ~/.bash_profile [email protected]

192.168.231.2$i:~/.bash_profile ; done

4. 安裝spark

① 解壓軟體

[[email protected]hadoop]# tar -xzvf spark-1.6.1-bin-hadoop2.6.tgz 

[[email protected] hadoop]# mv spark-1.6.1-bin-hadoop2.6 spark

[[email protected] conf]# pwd

/opt/hadoop/spark/conf

[[email protected] conf]# cp spark-env.sh.template spark-env.sh

[[email protected] conf]# cp spark-defaults.conf.template spark-defaults.conf

② 配置spark-env.sh

vi spark-env.sh

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2

export SCALA_HOME=/opt/hadoop/scala

export JAVA_HOME=/usr/java/jdk1.7.0_79

###the max memory size of worker

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_WORK_DIR=/opt/hadoop/spark/work

export SPARK_LOCAL_DIRS=/opt/hadoop/spark/tmp

export SPARK_DAEMON_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 -Dspark.deploy.zookeeper.dir=/spark"

③ 修改slaves檔案    

[[email protected] conf]# cp slaves.template slaves

vi slaves 

新增如下

slavenode1

slavenode2

slavenode3

slavenode4

slavenode5

slavenode6

slavenode7

slavenode8

④ 修改spark-defaults.conf

vi spark-defaults.conf

spark.eventLog.enabled           true

spark.eventLog.dir               hdfs://cluster-ha/spark/logs

[[email protected]1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark

[[email protected]1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark/logs

⑤ 拷貝spark目錄和環境變數設定檔案到其他機器上

[[email protected] conf]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r spark 192.168.231.2$i:/opt/hadoop/ ; done

[[email protected] hadoop]#for i in {31,32,33,34,35,36,37,38,39};do scp  ~/.bash_profile [email protected]192.168.231.2$i:~/.bash_profile ; done

⑥ 啟動spark

[[email protected] hadoop]# cd spark

[[email protected] spark]# cd sbin/

[[email protected] sbin]$ ./start-all.sh 

[[email protected] sbin]$ jps

9836 RunJar

3428 DFSZKFailoverController

2693 NameNode

5851 RunJar

3635 ResourceManager

14061 Jps

13859 Master

3924 HMaster

masternode2(192.168.237.231)節點,把start-master.sh 啟動,當masternode1(192.168.237.230)掛掉時,masternode2頂替當master

[[email protected] sbin]$ ./start-master.sh

[[email protected] sbin]$ jps

11867 Master

3704 DFSZKFailoverController

2683 NameNode

3923 ResourceManager

12064 Jps

4500 HMaster

[[email protected] bin]# ./spark-shell 

SQL context available as sqlContext.

scala> 

出現scala>時說明成功。

⑦ 測試HA是否生效

(1)先檢視一下兩個節點的執行情況,現在masternode1運行了master,masternode2是待命狀態

進入Spark的Web管理頁面:

(2)在masternode1上把master服務停掉

[[email protected] sbin]$ ./stop-master.sh 

用瀏覽器訪問masternode1的8080埠,看是否還活著。以下可以看出,master已經掛掉

再用瀏覽器訪問檢視masternode2的狀態,從下圖看出,masternode2已經被切換當master了

注意如果8080埠被其他程式佔用,spark會自動往上加一

採用叢集方式啟動spark-shell

[[email protected] sbin]$ spark-shell --master spark://masternode1:7077 &

http://spark.apache.org/docs/latest/configuration.html