1. 程式人生 > 其它 >Spark叢集安裝配置—Spar2.4.5-Centos7

Spark叢集安裝配置—Spar2.4.5-Centos7

一、實驗環境

二、下載安裝

三、核心檔案配置

四、啟動程式

----------------------------------------------------------

一、實驗環境

可以先完成以下環境配置,也可直接安裝:
1.1 Hadoop2.7叢集安裝配置
1.2 Anaconda3安裝配置
1.3 系統:Centos7,hadoop使用者(和Hadoop叢集同個使用者)

二、下載安裝

2.1 下載地址:spark-2.4.5-bin-hadoop2.7.tgz
2.2 進入檔案存放目錄,解壓縮:

$ sudo tar -zxvf ./spark-2.4.5-bin-hadoop2.7.tgz -C /usr/local/hdfs/
$ cd /usr/local/hdfs/
$ sudo mv ./spark-2.4.5-bin-hadoop2.7 ./spark2.4.5
$ sudo chown -R hadoop ./spark2.4.5
$ sudo ln -s /usr/local/hdfs/spark2.4.5 ~/hdfs/spark

2.3 配置環境變數

$ vi ~/.bash_profile

SPARK_HOME=/home/hadoop/hdfs/spark
export SPARK_HOME
PATH=$SPARK_HOME/bin:$PATH
export PATH

$ source ~/.bash_profile

在任何介面輸入hive,然後連續按兩下Tab鍵,顯示下面內容則設定成功:

$ spark
spark  spark-class   sparkR   spark-shell   spark-sql   spark-submit

三、核心檔案配置

$ cd ~/hdfs/spark/conf
$ sudo cp ./slaves.template  ./slaves
$ sudo cp ./spark-env.sh.template  ./spark-env.sh
$ sudo cp ./spark-defaults.conf.template ./spark-defaults.conf
$ sudo chown -R hadoop /usr/local/hdfs/spark2.4.5

3.1 slaves

$ vi ./slaves

增加所有的spark executor的機器

Master
Slave2
Slave3
....

3.2 spark-config.sh

& vi $SPARK_HOME/sbin/spark-config.sh

在空白處增加JAVA_HOME路徑:

export JAVA_HOME=/usr/jvm/jdk1.8

3.3 spark-env.sh

$ vi ./spark-env.sh

在最後面加上如下一行:

export HADOOP_CONF_DIR=/usr/local/hdfs/hadoop/conf

3.4 spark-defaults.conf

$ start-all.sh
$ hdfs dfs -mkdir /spark_lib
hdfs dfs -mkdir /spark-logs
$ hdfs dfs -put ~/hdfs/spark/jars/* /spark_lib
$ #stop-all.sh
$ vi ./spark-defaults.conf

在後面空白增加:

spark.master    yarn                                # 告訴spark現在使用的是yarn模式
#spark.yarn.jars hdfs://Master:9000/spark_lib/*.jar  # spark jar包所在的目錄  
#spark.yarn.stagingDir   hdfs://Master:9000/tmp      # spark執行的時候臨時目錄存放的檔案

spark.history.provider            org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.fs.logDirectory     hdfs://Master:9000/spark-logs
spark.history.fs.update.interval  10s
spark.history.ui.port             18080
spark.eventLog.enabled true
#spark.eventLog.dir hdfs://Master:9000/spark-logs

“#”標記需要修改的地方,“Master”為NameNode主機名

3.5 yarn-site.xml

關閉檢查真實的記憶體

sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

在原有Hadoop配置上,增加以下:

<property>
    <name>yarn.resourcemanager.address</name>
    <value>Master:8032</value>
</property>

<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>Master:8031</value>
</property>

<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>Master:8030</value>
</property>

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>                       
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property> 

<property>
    <name>yarn.acl.enable</name>
    <value>0</value>
</property>

3.6 mapred-site.xml

$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

在原有Hadoop配置上,增加以下:

<property>
	<name>mapreduce.jobtracker.address</name>
	<value>Master:54311</value>
	<description>MapReduce job tracker runs at this host and port.
	</description>
</property>

各Master為自己NameNode地址
把配置檔案複製到各節點上
for i in {slave01,slave02}; do scp /usr/local/hdfs/spark2.4.5 $i:/usr/local/hdfs/; done

四、啟動程式

$ #start-all.sh
$ $SPARK_HOME/sbin/start-all.sh

使用jps檢視,有Master和Workers則啟動成功:

$ jps
71601 SecondaryNameNode
71347 DataNode
71827 ResourceManager
72405 Master      
71212 NameNode
71964 NodeManager
72508 Worker
72734 Jps
$ spark-shell

啟動成功後如圖所示,會有 “scala >” 的命令提示符;並且 “master = yarn” 表示執行在yarn上

Spark context available as 'sc' (master = yarn, app id = application_1628143668230_0003).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.5
      /_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_301)
Type in expressions to have them evaluated.
Type :help for more information.

scala>