1. 程式人生 > 實用技巧 >荷月科技加入新華三生態,H3C區塊鏈超融合一體機亮相數字峰會

荷月科技加入新華三生態,H3C區塊鏈超融合一體機亮相數字峰會

Hadoop叢集搭建

1.建立三臺虛擬機器,本次使用的是centos7,關閉所有機器的防火牆。

  1. 關閉防火牆:

    [hadoop@localhost ~]$ systemctl stop firewalld.service
    
  2. 修改主機名,方便對虛擬機器進行區分。

    主節點名稱設為master,其他兩個節點名稱設定為slave1,slave2。

    檢視主機名並修改:

    [hadoop@localhost ~]$ hostname
    localhost.localdomain
    [hadoop@localhost ~]$ hostnamectl set-hostname master
    [hadoop@localhost ~]$ hostname
    master
    

    修改完重啟機器:

    [hadoop@localhost ~]$ reboot
    

2.編輯ip與hostname的對映表 /etc/hosts

將所有機器的ip與hostname的對應關係新增到hosts檔案中,所有的節點都要新增,相當於DNS

172.16.46.161	master
172.16.46.163	slave1
172.16.46.162	slave2

3.ssh免密登陸

請檢視ssh免密登陸

4.安裝jdk

請檢視jdk安裝

5.安裝hadoop

下載地址hadoop,下載.tar.gz格式的包

解壓

[hadoop@master ~]$ tar -zxvf hadoop-2.9.2.tar.gz

設定環境變數,在/etc/profile最下方新增

export HADOOP_HOME=/home/hadoop/hadoop-2.9.2
export PATH=.:$HADOOP_HOME/bin:$PATH

載入環境變數

source /etc/profile

驗證hadoop是否安裝成功

[hadoop@master ~]$ hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

出現上述輸出,證明安裝成功。

如果未輸出,則可以嘗試重新啟動機器載入環境變數。

6.配置hadoop

進入hadoop安裝目錄

6.1配置etc/hadoop/hadoop-env.sh

修改JAVA_HOME為jdk安裝目錄的絕對路徑

6.2配置etc/hadoop/core-site.xml

設定hdfs的Namenode地址,設定hadoop執行時臨時檔案的儲存路徑

<configuration>
    <property>
        <name>fs.defaultFS</name>
       <value>hdfs://172.16.46.161:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
       <value>file:/home/hadoop/hadoop-2.9.2/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>
</configuration>

如果沒有配置hadoop.tmp.dir,預設儲存在/tmp/hadoop-username目錄下

6.3配置etc/hadoop/hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>4</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop-2.9.2/hdfs/name</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop-2.9.2/hdfs/data</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>172.16.46.161:9001</value>
  </property>
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

6.4配置etc/hadoop/mapred-site.xml

將mapper-site.xml.template重新命名為mapper-site.xml

[hadoop@master hadoop-2.9.2]$ mv etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

修改mapper-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

指定mr執行在yarn上

6.5配置etc/hadoop/slaves

刪除原有內容,寫入所有節點的ip地址

172.16.46.161
172.16.46.163
172.16.46.162

6.6配置etc/hadoop/yarn-env.sh和etc/hadoop/mapred-env.sh

將JAVA_HOME配置成jdk安裝目錄的絕對路徑

6.7配置etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>172.16.46.161:18040</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>172.16.46.161:18030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>172.16.46.161:18088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>172.16.46.161:18025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>172.16.46.161:18141</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

7.將修改後的配置拷貝到其他節點

scp -r etc/ hadoop@slave1:~/hadoop-2.9.2/

8.啟動叢集

8.1格式化namenode

叢集搭建好了,將磁碟格式化一下,後面要存資料,避免有髒資料,同時建立一些東西。

只有第一次啟動需要格式化

namenode設定在哪個節點上就在哪個節點上執行下面的命令

bin/hdfs namenode -format

8.2啟動叢集前必須保證namenode和datanode已經啟動

單節點啟動namenode

[hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
[hadoop@master hadoop-2.9.2]# jps
3877 NameNode
3947 Jps

單節點啟動datanode

[hadoop@master hadoop-2.9.2]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
[hadoop@master hadoop-2.9.2]# jps
3877 NameNode
4060 Jps
3982 DataNode

在 其他節點 依次啟動 datanode

這樣啟動hdfs很麻煩,而且發現SecondaryNameNode並沒有啟動,素有hadoop提供了其他的啟動方式

一步啟動hdfs叢集:Namenode、Datanode、SecondaryNameNode

[hadoop@master hadoop-2.9.2]$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-namenode-master.out
172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-resourcemanager-master.out
172.16.46.163: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave1.out
172.16.46.162: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-slave2.out
172.16.46.161: starting nodemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-hadoop-nodemanager-master.out
[hadoop@master hadoop-2.9.2]$ jps
4192 Jps
3237 NameNode
3543 SecondaryNameNode
3374 DataNode

8.3啟動yarn

看yarn要設定在哪個節點,就在哪個節點執行下面的命令。

[hadoop@master hadoop-2.9.2]# sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-master.out
172.16.46.162: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave2.out
172.16.46.161: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-master.out
172.16.46.163: starting datanode, logging to /home/hadoop/hadoop-2.9.2/logs/hadoop-hadoop-datanode-slave1.out
[hadoop@master hadoop-2.9.2]$ jps
4192 Jps
3237 NameNode
3814 NodeManager
3543 SecondaryNameNode
3374 DataNode
3695 ResourceManager

ResourceManager和NodeManager都啟動了。

8.4hadoop叢集啟動成功,包括hdfs、yarn、mapreduce

上面這種啟動方式很麻煩,hadoop還提供了一鍵啟動和一鍵關閉。

sbin/start-all.sh 
sbin/stop-all.sh

9.遠端訪問hadoop叢集

訪問hdfs:http://172.16.46.161:50070/

10.簡單測試

在hdfs檔案系統中建立目錄,兩種方式。

bin/hdfs dfs -mkdir -p /usr/input
bin/hadoop fs -mkdir -p /usr/output

叢集部署規劃

上述步驟已經將hadoop叢集搭建完成,但是我們將Namenode、SecondaryNamenode、ResourceManager都部署到一臺機器上。

這樣會增大伺服器的壓力,而且元件的資源都被壓縮了。所以可以部署到三臺機器。

hadoop11 hadoop12 hadoop13
HDFS NameNode、DataNode DataNode SecondaryNameNode
YARN NodeManager ResourceManager、NodeManager NodeManager

三個核心元件分佈到三臺機器。

異常記錄

  • 找不到jps

    jps是檢視java程序的,找不到說明java沒有裝好,需要設定java環境變數

  • 重啟後無法啟動datanode

    通常在第一次搭建時可以成功,但是重啟後不能成功,datanode 無法啟動,原因是 datanode 無法被 namenode 識別。

    namenode 在 format 時會形成兩個標識,blockPoolId 和 clusterId;

    當有 datanode 加入時,會獲取這兩個標識作為從屬 這個 namenode 的標識,這樣才能組成叢集;

    一旦 namenode 被重新 format,會更新這兩個標識;

    然而 datanode 還拿原來的標識過來接頭,自然被拒之門外

    解決方法:刪除所有節點的資料,即 tmp,包括 namenode 的資料,重新格式化,再啟動

  • 各種操作都會有如下 警告

    WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    

    無需理會,只是警告,確實想解決,參考 解決辦法