Spark叢集安裝配置—Spar2.4.5-Centos7
阿新 • • 發佈:2021-08-05
一、實驗環境
二、下載安裝
三、核心檔案配置
四、啟動程式
----------------------------------------------------------
一、實驗環境
可以先完成以下環境配置,也可直接安裝:
1.1 Hadoop2.7叢集安裝配置
1.2 Anaconda3安裝配置
1.3 系統:Centos7,hadoop使用者(和Hadoop叢集同個使用者)
二、下載安裝
2.1 下載地址:spark-2.4.5-bin-hadoop2.7.tgz
2.2 進入檔案存放目錄,解壓縮:
$ sudo tar -zxvf ./spark-2.4.5-bin-hadoop2.7.tgz -C /usr/local/hdfs/ $ cd /usr/local/hdfs/ $ sudo mv ./spark-2.4.5-bin-hadoop2.7 ./spark2.4.5 $ sudo chown -R hadoop ./spark2.4.5 $ sudo ln -s /usr/local/hdfs/spark2.4.5 ~/hdfs/spark
2.3 配置環境變數
$ vi ~/.bash_profile
SPARK_HOME=/home/hadoop/hdfs/spark
export SPARK_HOME
PATH=$SPARK_HOME/bin:$PATH
export PATH
$ source ~/.bash_profile
在任何介面輸入hive,然後連續按兩下Tab鍵,顯示下面內容則設定成功:
$ spark
spark spark-class sparkR spark-shell spark-sql spark-submit
三、核心檔案配置
$ cd ~/hdfs/spark/conf $ sudo cp ./slaves.template ./slaves $ sudo cp ./spark-env.sh.template ./spark-env.sh $ sudo cp ./spark-defaults.conf.template ./spark-defaults.conf $ sudo chown -R hadoop /usr/local/hdfs/spark2.4.5
3.1 slaves
$ vi ./slaves
增加所有的spark executor的機器
Master
Slave2
Slave3
....
3.2 spark-config.sh
& vi $SPARK_HOME/sbin/spark-config.sh
在空白處增加JAVA_HOME路徑:
export JAVA_HOME=/usr/jvm/jdk1.8
3.3 spark-env.sh
$ vi ./spark-env.sh
在最後面加上如下一行:
export HADOOP_CONF_DIR=/usr/local/hdfs/hadoop/conf
3.4 spark-defaults.conf
$ start-all.sh
$ hdfs dfs -mkdir /spark_lib
hdfs dfs -mkdir /spark-logs
$ hdfs dfs -put ~/hdfs/spark/jars/* /spark_lib
$ #stop-all.sh
$ vi ./spark-defaults.conf
在後面空白增加:
spark.master yarn # 告訴spark現在使用的是yarn模式
#spark.yarn.jars hdfs://Master:9000/spark_lib/*.jar # spark jar包所在的目錄
#spark.yarn.stagingDir hdfs://Master:9000/tmp # spark執行的時候臨時目錄存放的檔案
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
#spark.history.fs.logDirectory hdfs://Master:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
spark.eventLog.enabled true
#spark.eventLog.dir hdfs://Master:9000/spark-logs
“#”標記需要修改的地方,“Master”為NameNode主機名
3.5 yarn-site.xml
關閉檢查真實的記憶體
sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
在原有Hadoop配置上,增加以下:
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:8030</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
3.6 mapred-site.xml
$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
在原有Hadoop配置上,增加以下:
<property>
<name>mapreduce.jobtracker.address</name>
<value>Master:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>
各Master為自己NameNode地址
把配置檔案複製到各節點上
for i in {slave01,slave02}; do scp /usr/local/hdfs/spark2.4.5 $i:/usr/local/hdfs/; done
四、啟動程式
$ #start-all.sh
$ $SPARK_HOME/sbin/start-all.sh
使用jps檢視,有Master和Workers則啟動成功:
$ jps
71601 SecondaryNameNode
71347 DataNode
71827 ResourceManager
72405 Master
71212 NameNode
71964 NodeManager
72508 Worker
72734 Jps
$ spark-shell
啟動成功後如圖所示,會有 “scala >” 的命令提示符;並且 “master = yarn” 表示執行在yarn上
Spark context available as 'sc' (master = yarn, app id = application_1628143668230_0003).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_301)
Type in expressions to have them evaluated.
Type :help for more information.
scala>