儲存圖片再次彈出授權框
Hadoop叢集搭建
1.建立三臺虛擬機器,本次使用的是centos7,關閉所有機器的防火牆。
-
關閉防火牆:
[hadoop@localhost ~]$ systemctl stop firewalld.service
-
修改主機名,方便對虛擬機器進行區分。
主節點名稱設為master,其他兩個節點名稱設定為slave1,slave2。
檢視主機名並修改:
[hadoop@localhost ~]$ hostname localhost.localdomain [hadoop@localhost ~]$ hostnamectl set-hostname master [hadoop@localhost ~]$ hostname master
修改完重啟機器:
[hadoop@localhost ~]$ reboot
2.編輯ip與hostname的對映表 /etc/hosts
將所有機器的ip與hostname的對應關係新增到hosts檔案中,所有的節點都要新增,相當於DNS
172.16.46.161 master
172.16.46.163 slave1
172.16.46.162 slave2
3.ssh免密登陸
請檢視ssh免密登陸
4.安裝jdk
請檢視jdk安裝
5.安裝hadoop
下載地址hadoop,下載.tar.gz格式的包
解壓
[hadoop@master ~]$ tar -zxvf hadoop-2.9.2.tar.gz
設定環境變數,在/etc/profile最下方新增
export HADOOP_HOME=/home/hadoop/hadoop-2.9.2
export PATH=.:$HADOOP_HOME/bin:$PATH
載入環境變數
source /etc/profile
驗證hadoop是否安裝成功
[hadoop@master ~]$ hadoop Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] CLASSNAME run the class named CLASSNAME or where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file note: please use "yarn jar" to launch YARN applications, not this command. checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the Hadoop jar and the required libraries credential interact with credential providers daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings Most commands print help when invoked w/o parameters.
出現上述輸出,證明安裝成功。
如果未輸出,則可以嘗試重新啟動機器載入環境變數。
6.配置hadoop
進入hadoop安裝目錄
6.1配置etc/hadoop/hadoop-env.sh
修改JAVA_HOME為jdk安裝目錄的絕對路徑
6.2配置etc/hadoop/core-site.xml
設定hdfs的Namenode地址,設定hadoop執行時臨時檔案的儲存路徑
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.16.46.161:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.9.2/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
如果沒有配置hadoop.tmp.dir,預設儲存在/tmp/hadoop-username目錄下
6.3配置etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>4</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.9.2/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.9.2/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>172.16.46.161:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
6.4配置etc/hadoop/mapred-site.xml
將mapper-site.xml.template重新命名為mapper-site.xml
[hadoop@master hadoop-2.9.2]$ mv etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
修改mapper-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
指定mr執行在yarn上
6.5配置etc/hadoop/slaves
刪除原有內容,寫入所有節點的ip地址
172.16.46.161
172.16.46.163
172.16.46.162
6.6配置etc/hadoop/yarn-env.sh和etc/hadoop/mapred-env.sh
將JAVA_HOME配置成jdk安裝目錄的絕對路徑
6.7配置etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>172.16.46.161:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.16.46.161:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>172.16.46.161:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.16.46.161:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>172.16.46.161:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
7.將修改後的配置拷貝到其他節點
scp -r etc/ hadoop@slave1:~/hadoop-2.9.2/
8.啟動叢集
8.1格式化namenode
叢集搭建好了,將磁碟格式化一下,後面要存資料,避免有髒資料,同時建立一些東西。
只有第一次啟動需要格式化
namenode設定在哪個節點上就在哪個節點上執行下面的命令
bin/hdfs namenode -format
8.2啟動叢集前必須保證namenode和datanode已經啟動
單節點啟動namenode
[hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
[hadoop@master hadoop-2.9.2]# jps
3877 NameNode
3947 Jps
單節點啟動datanode
[hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
[hadoop@master hadoop-2.9.2]# jps
3877 NameNode
4060 Jps
3982 DataNode
在 其他節點 依次啟動 datanode
這樣啟動hdfs很麻煩,而且發現SecondaryNameNode並沒有啟動,素有hadoop提供了其他的啟動方式
一步啟動hdfs叢集:Namenode、Datanode、SecondaryNameNode
[hadoop@master hadoop-2.9.2]$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-namenode-master.out
172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-resourcemanager-master.out
172.16.46.163: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave1.out
172.16.46.162: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave2.out
172.16.46.161: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-master.out
[hadoop@master hadoop-2.9.2]$ jps
4192 Jps
3237 NameNode
3543 SecondaryNameNode
3374 DataNode
8.3啟動yarn
看yarn要設定在哪個節點,就在哪個節點執行下面的命令。
[hadoop@master hadoop-2.9.2]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-master.out
172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
[hadoop@master hadoop-2.9.2]$ jps
4192 Jps
3237 NameNode
3814 NodeManager
3543 SecondaryNameNode
3374 DataNode
3695 ResourceManager
ResourceManager和NodeManager都啟動了。
8.4hadoop叢集啟動成功,包括hdfs、yarn、mapreduce
上面這種啟動方式很麻煩,hadoop還提供了一鍵啟動和一鍵關閉。
sbin/start-all.sh
sbin/stop-all.sh
9.遠端訪問hadoop叢集
訪問hdfs:http://172.16.46.161:50070/
10.簡單測試
在hdfs檔案系統中建立目錄,兩種方式。
bin/hdfs dfs -mkdir -p /usr/input
bin/hadoop fs -mkdir -p /usr/output
叢集部署規劃
上述步驟已經將hadoop叢集搭建完成,但是我們將Namenode、SecondaryNamenode、ResourceManager都部署到一臺機器上。
這樣會增大伺服器的壓力,而且元件的資源都被壓縮了。所以可以部署到三臺機器。
hadoop11 | hadoop12 | hadoop13 | |
---|---|---|---|
HDFS | NameNode、DataNode | DataNode | SecondaryNameNode |
YARN | NodeManager | ResourceManager、NodeManager | NodeManager |
三個核心元件分佈到三臺機器。
異常記錄
-
找不到jps
jps是檢視java程序的,找不到說明java沒有裝好,需要設定java環境變數
-
重啟後無法啟動datanode
通常在第一次搭建時可以成功,但是重啟後不能成功,datanode 無法啟動,原因是 datanode 無法被 namenode 識別。
namenode 在 format 時會形成兩個標識,blockPoolId 和 clusterId;
當有 datanode 加入時,會獲取這兩個標識作為從屬 這個 namenode 的標識,這樣才能組成叢集;
一旦 namenode 被重新 format,會更新這兩個標識;
然而 datanode 還拿原來的標識過來接頭,自然被拒之門外
解決方法:刪除所有節點的資料,即 tmp,包括 namenode 的資料,重新格式化,再啟動
-
各種操作都會有如下 警告
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
無需理會,只是警告,確實想解決,參考 解決辦法