hadoop3節點叢集安裝,spark叢集安裝
一 : 修改機器名
1. 修改3臺機器的機器名,注意名字不要帶下劃線
修改機器名命令:
hostnamectl set-hostname xxxx
然後退出shell重新登陸
修改3臺機器的hosts檔案
vim /etc/hosts
新增以下內容
192.107.53.157 hadoop-master
192.107.53.158 hadoop-slave1
192.107.53.159 hadoop-slave2
二:主從節點免密碼登陸
1. 免金鑰登陸本機
1. 關閉防火牆 檢視防火牆狀態 service iptables status 關閉防火牆 service iptables stop chkconfig iptables off
2. 免密碼登入本機
1)生產祕鑰
ssh-keygen -t rsa
2)將公鑰追加到”authorized_keys”檔案
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3)賦予許可權
chmod 600 .ssh/authorized_keys
4)驗證本機能無密碼訪問
ssh hadoop-master
依次配置hadoop-slave1, hadoop-slave2 免金鑰登陸本機
2. hadoop-master本機無密碼登入hadoop-slave1、hadoop-slave2,以hadoop-master無密碼登入hadoop-slave1為例進行講解:(主節點免密碼登陸從節點)
1)登入hadoop-slave1 ,複製hadoop-master伺服器的公鑰”id_rsa.pub”到hadoop-slave1伺服器的”root”目錄下。 scp [email protected]:/root/.ssh/id_rsa.pub /root/ 或者 scp [email protected]:/root/.ssh/id_rsa.pub /root/ 2)將hadoop-master的公鑰(id_rsa.pub)追加到hadoop-slave1的authorized_keys中 cat id_rsa.pub >> .ssh/authorized_keys rm -rf id_rsa.pub 3)在 hadoop-master上面測試 ssh hadoop-slave1
登入hadoop-slave2 ,執行上面同樣的操作
3. 配置hadoop-slave1,hadoop-slave2本機無密碼登入hadoop-master(從節點免密碼登陸主節點)
1)登入hadoop-master,複製hadoop-slave1伺服器的公鑰”id_rsa.pub”到hadoop-master伺服器的”/root/”目錄下。
scp [email protected]:/root/.ssh/id_rsa.pub /root/
2)將hadoop-slave1的公鑰(id_rsa.pub)追加到hadoop-master的authorized_keys中。
cat id_rsa.pub >> .ssh/authorized_keys
rm -rf id_rsa.pub //刪除id_rsa.pub
3)在 hadoop-slave1上面測試
ssh hadoop-master
重複上述步驟,使得hadoop-slave2本機也可以無密碼登陸主節點
至此,主從節點的免密碼登陸完成
三:hadoop安裝
1. hadoop-master的安裝和配置
1) 安裝jdk
#下載
jdk-8u171-linux-x64.tar.gz
#解壓
tar -xzvf
jdk-8u171-linux-x64.tar.gz -C /usr/local
#重新命名
mv jdk-8u171-linux-x64 java
2) 安裝hadoop
#下載
hadoop-3.1.0.tar.gz
#解壓
tar -xzvf hadoop-3.1.0.tar.gz -C /usr/local
#重新命名
mv hadoop-3.1.0.tar.gz hadoop
3) 配置環境變數
vim /etc/profile
JAVA_HOME="/usr/local/java"
export PATH="$JAVA_HOME/bin:$PATH"
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
4) hadoop相關配置
cd /usr/local/hadoop/etc/hadoop
a) 配置core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.107.53.157:9000</value>
</property>
</configuration>
b) 配置hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>192.107.53.157:9000</value>
</property>
</configuration>
c) 配置 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://192.107.53.157:9001</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
d) 配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
</configuration>
e) 配置workers檔案
hadoop-slave1
hadoop-slave2
f) 配置hadoop-env.sh
export JAVA_HOME=/usr/local/java
2. hadoop-slave1的安裝和配置(其他從節點操作一樣)
1)複製hadoop和java到hadoop-slave1節點
scp -r /usr/local/hadoop hadoop-slave1:/usr/local/
scp -r /usr/local/java hadoop-slave1:/usr/local/
2) 登入hadoop-slave1伺服器,刪除workers內容
rm -rf /usr/local/hadoop/etc/hadoop/workers
3) 配置環境變數
vim /etc/profile
JAVA_HOME="/usr/local/java"
export PATH="$JAVA_HOME/bin:$PATH"
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
四:啟動hadoop叢集
修改sh的使用者,不然啟動會報錯
vim start-dfs.sh 以及 vim stop-dfs.sh 分別新增下面4行
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
vim start-yarn.sh 以及 vim stop-yarn.sh 分別新增下面4行
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
1) 首次啟動需要format namenode
hdfs namenode -format
2) 啟動hadoop:
sbin/start-all.sh
3) 使用jps命令檢視執行情況
#master 執行 jps檢視執行情況
25928 SecondaryNameNode
25742 NameNode
26387 Jps
26078 ResourceManager
#slave 執行 jps檢視執行情況
24002 NodeManager
23899 DataNode
24179 Jps
4) 跑計算圓周率的程式,說明hadoop可以正常執行
hadoop jar /usr/local/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar pi 5 10
5) 登陸瀏覽器 http://192.107.53.157:8088/cluster/apps
五:問題處理
1. Hadoop叢集配置之後瀏覽器無法訪問問題
https://blog.csdn.net/csdn_chuxuezhe/article/details/73322068
修改主機名:
vi /etc/sysconfig/network
在下邊修改:
NETWORKING=yes
HOSTNAME=hadoop-master
同時,修改hosts vi /etc/hosts
192.107.53.157 hadoop-master
192.107.53.158 hadoop-slave1
192.107.53.159 hadoop-slave2
重啟!!!!!
參考文獻
http://www.ityouknow.com/hadoop/2017/07/24/hadoop-cluster-setup.html
六: spark 叢集安裝
1. 以hadoop-master節點為例
1. 安裝scala
2. 安裝spark
3. 配置環境變數
#scala
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
#spark
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
4. spark配置
cp spark-env.sh.template
spark-env.sh
vim spark-env.sh
export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_HOME=/usr/local/spark
export SCALA_HOME=/usr/local/scala
export SPARK_MASTER_IP=hadoop-master
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
JAVA_HOME:Java安裝目錄
SCALA_HOME:Scala安裝目錄
HADOOP_HOME:hadoop安裝目錄
HADOOP_CONF_DIR:hadoop叢集的配置檔案的目錄
SPARK_MASTER_IP:spark叢集的Master節點的ip地址
SPARK_WORKER_MEMORY:每個worker節點能夠最大分配給exectors的記憶體大小
SPARK_WORKER_CORES:每個worker節點所佔有的CPU核數目
SPARK_WORKER_INSTANCES:每臺機器上開啟的worker節點的數目
5. 編輯slaves
cp slaves.template slaves
vi slaves加入Worker節點如下配置
hadoop-slave1
hadoop-slave2
2. hadoop-slave1,hadoop-slave2兩個節點將scala,spark包複製過去即可
1.登陸hadoop-svale1
scp -r [email protected]:/usr/local/scala /usr/local
scp -r [email protected]:/usr/local/spark /usr/local
2. 配置環境變數
#scala
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
#spark
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
登陸hadoop-svale2執行一樣的操作
3. 啟動spark叢集
1 cd /usr/local/spark/sbin
./start-all.sh
4. 執行樣例
下面連結各種提交模式都有,可參考
http://zhenggm.iteye.com/blog/2358324
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client examples/jars/spark-examples_2.11-2.1.0.jar
七 問題解決
問題1:Container xxx is running beyond physical memory limits
-
日誌:
問題分析:Container [pid=134663,containerID=container_1430287094897_0049_02_067966] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.5 GB of 10 GB virtual memory used. Killing container. Dump of the process-tree for
從日誌可以看出,container使用記憶體超過虛擬記憶體的限制,導致如上問題。預設2.1;
NodeManager端設定,類似系統層面的overcommit問題,需要調節yarn.nodemanager.vmem-pmem-ratio相關引數,在yarn-site.xml修改:
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>10</value>
</property>
–或者yarn.nodemanager.vmem-check-enabled,false掉
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
問題2:datanode程序掛掉
重啟程序
sbin/hadoop-daemon.sh start datanode
問題3:配置spark-history並啟動程序
只有日誌檔案往往是不夠的,有時候我們要檢視歷史記錄,這就需要在driver節點啟動History Server
在$SPARK_CONF_DIR下面的spark-defaults.conf檔案中新增EventLog和History Server的配置
# EventLog
spark.eventLog.enabled true
spark.eventLog.dir file:///opt/spark/current/spark-events
# History Server
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory file:/opt/spark/current/spark-events
這裡注意要建立/opt/spark/current/spark-events路徑,application的執行歷史才會儲存到該路徑。
執行啟動命令
./sbin/start-history-server.sh
可參考文獻
http://www.leonlu.cc/profession/14-spark-log-and-history/
問題4 日誌配置
https://blog.csdn.net/stark_summer/article/details/46929481
spark的日誌一方面列印到控制檯,一方面寫入到/home/hadoop/spark.log中了,這是日誌的繼承特性,後面再來改進,目前把log4j.rootCategory=INFO, console,FILE改為log4j.rootCategory=INFO, FILE即可
cd /usr/local/spark/conf
vim log4j.properties
log4j.rootCategory=INFO, console,FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender
log4j.appender.FILE.Threshold=DEBUG
log4j.appender.FILE.file=/usr/local/spark/logs/spark.log
log4j.appender.FILE.DatePattern='.'yyyy-MM-dd
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=[%-5p] [%d{yyyy-MM-dd HH:mm:ss}] [%C{1}:%M:%L] %m%n
# spark
log4j.logger.org.apache.spark=INFO
問題5 spark連線ssl的kafka
spark叢集的3臺機器分別都要生成
將這4個檔案放在一個目錄下,直接執行2_ServerGenKey.sh指令碼即可
八: 配置定時執行指令碼
vim /etc/crontab
*/1 * * * * root /bin/sh /usr/local/jars/run.sh
若不想列印執行日誌,可這樣寫 */1 * * * * root /bin/sh /usr/local/jars/run.sh /dev/null 2>&1
檢視日誌路徑 vim /var/log/cron