1. 程式人生 > >Hadoop初探——Hadoop叢集搭建

Hadoop初探——Hadoop叢集搭建

本人在進行Hadoop叢集搭建建立了三臺虛擬機器,一臺作為master,另外兩臺為slave,在搭建叢集的過程中採用是centOs7,Java版本為1.8,Hadoop版本為2.7。

三臺虛擬機器對應的IP為:192.168.25.128,192.168.25.129,192.168.25.130.

虛擬建立完成之後首先應該先將各臺虛擬機器的防火牆關閉:                          [[email protected] ~]# systemctl stop firewalld.service                                                                                                                         [

[email protected] ~]#  systemctl disable firewalld.service                                                                                                                     之後在三臺虛擬機器上建立目錄/export/software(存放壓縮包)和/export/servers(存放解壓檔案):本人將Java解壓檔案放在了 /export/servers/java下,Hadoop解壓檔案放在/export/servers/hadoop所以直接建立:                                                [
[email protected]
~]#
 mkdir  /export/servers/java                                                                                                                               

[[email protected] ~]# mkdir  /export/servers/hadoop

之後上傳Java壓縮包到虛擬機器上採用rz命令,若果rz命令報錯,則採用yum install lrzsz命令安裝外掛,即可操作之後選擇Java安裝包所在的路徑,上傳壓縮包。由於上傳的壓縮包不在事先建立的目錄中所以應該執行以下命令移動壓縮包:

[[email protected] ~]# mv jdk-8u161-linux-x64.tar.gz /export/software

之後進入software目錄下:

[[email protected] ~]# cd /export/software

解壓Java壓縮包到/export/servers/java

[[email protected] software]# tar -zxvf jdk-8u161-linux-x64.tar.gz -C /export/servers/java

之後配置Java環境變數:

[[email protected] servers]# echo -e "nexport JAVA_HOME=/export/servers/java/jdk1.8.0_161">> /etc/profile

[[email protected] servers]# echo -e "nexport PATH=\$PATH:\$JAVA_HOME/bin">> /etc/profile

[[email protected] servers]#  echo -e "nexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar">> /etc/profile

Java環境變數配置完成之後可以採用以下命令即時生效不用重啟伺服器:

[[email protected] servers]# source /etc/profile

採用以下命令檢視Java版本,看是否安裝成功:

[[email protected] servers]# java -version

接下來配置主機域名,方便以後操作,在master主機上操作命令:

[[email protected] servers]# hostname master

[[email protected] servers]# vi /etc/hostname

master

在slave主機上操作命令:

[[email protected] servers]# hostname slave01

[[email protected] servers]# vi /etc/hostname

slave01

在另一臺slave主機上操作命令:

[[email protected] servers]# hostname slave02

[[email protected] servers]# vi /etc/hostname

slave02

之後配置hosts檔案使ip地址與主機域名對映,在三臺伺服器上執行以下命令:

[[email protected] ~]# vi /etc/hosts

192.168.25.128 master

192.168.25.129 slave01

192.168.25.130 slave02

接下來設定免密登陸,這樣可以防止以後重複的輸入密碼:

分別在三臺主機上執行以下命令:

[[email protected] ~]# ssh-keygen -t rsa(按enter鍵四次)

[[email protected] ~]# ssh-copy-id slave01(輸入要訪問的主機ip或者主機域名)

搭建Hadoop叢集:

上傳hadoop壓縮包,參照上面Java壓縮包的上傳,解壓hadoop壓縮包:

[[email protected] software]# tar -zxvf hadoop-2.7.3.tar.gz -C /export/servers/hadoop

配置Hadoop環境:

[[email protected] hadoop-2.7.3]# echo -e "nexport HADOOP_HOME=/export/servers/hadoop/hadoop-2.7.3">> /etc/profile

[[email protected] hadoop-2.7.3]# echo -e "export PATH=\$PATH:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin" >> /etc/profile

採用以下命令即時生效:

[[email protected] hadoop-2.7.3]# source /etc/profile

採用以下命令檢視hadoop版本:

[[email protected] hadoop-2.7.3]# hadoop version

修改hadoop配置檔案,hadoop-env.sh,yarn-env.sh增加JAVA_HOME配置

[[email protected] software]# echo -e "export JAVA_HOME=/soft/java/jdk1.7.0_79" >> /soft/hadoop/hadoop-2.7.3/etc/hadoop/hadoop-env.sh

[[email protected] software]# echo -e "export JAVA_HOME=/soft/java/jdk1.7.0_79" >> /soft/hadoop/hadoop-2.7.3/etc/hadoop/yarn-env.sh

建立目錄/hadoop,/hadoop/tmp,/hadoop/hdfs/data,/hadoop/hdfs/name

[[email protected] ~]# mkdir -p /hadoop/tmp

[[email protected] ~]# mkdir -p /hadoop/hdfs/data

[[email protected] ~]# mkdir -p /hadoop/hdfs/name

修改core-site.xml檔案

[[email protected] ~]# vi /export/servers/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml

<configuration>   <property>                 <name>hadoop.tmp.dir</name>                 <value>/hadoop/tmp</value>                 <description>Abase for other temporary directories.</description> </property> <property>                 <name>fs.defaultFS</name>                 <value>hdfs://master:9000</value> </property> <property>                 <name>io.file.buffer.size</name>                 <value>4096</value> </property>     </configuration>

修改hdfs-site.xml

[[email protected] ~]# vi /export/servers/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml 

<configuration>   <property>      <name>dfs.namenode.name.dir</name>      <value>file:/hadoop/hdfs/name</value>   </property>   <property>      <name>dfs.datanode.data.dir</name>      <value>file:/hadoop/hdfs/data</value>   </property>   <property>      <name>dfs.replication</name>      <value>2</value>   </property>   <property>      <name>dfs.namenode.secondary.http-address</name>      <value>master:9001</value>   </property>   <property>      <name>dfs.webhdfs.enabled</name>      <value>true</value>   </property> </configuration>

複製mapred-site.xml.template為mapred-site.xml,並修改

[[email protected] ~]# /export/servers/hadoop/hadoop-2.7.3/etc/hadoop

[[email protected] hadoop]# cp mapred-site.xml.template mapred-site.xml

[[email protected] hadoop]# vi mapred-site.xml

<configuration>   <property>         <name>mapreduce.framework.name</name>         <value>yarn</value>         <final>true</final>     </property>     <property>         <name>mapreduce.jobtracker.http.address</name>         <value>master:50030</value>     </property>     <property>         <name>mapreduce.jobhistory.address</name>         <value>master:10020</value>     </property>     <property>         <name>mapreduce.jobhistory.webapp.address</name>         <value>master:19888</value>     </property>     <property>          <name>mapred.job.tracker</name>          <value>http://master:9001</value> </property>   </configuration>

修改yarn-site.xml

[[email protected] hadoop]# vi yarn-site.xml

<property>          <name>yarn.resourcemanager.hostname</name>          <value>master</value>     </property>     <property>         <name>yarn.nodemanager.aux-services</name>         <value>mapreduce_shuffle</value>     </property>     <property>         <name>yarn.resourcemanager.address</name>         <value>master:8032</value>     </property>     <property>         <name>yarn.resourcemanager.scheduler.address</name>         <value>master:8030</value>     </property>     <property>         <name>yarn.resourcemanager.resource-tracker.address</name>         <value>master:8031</value>     </property>     <property>         <name>yarn.resourcemanager.admin.address</name>         <value>master:8033</value>     </property>     <property>         <name>yarn.resourcemanager.webapp.address</name>         <value>master:8088</value>     </property>

/export/servers/hadoop/hadoop-2.7.3/etc/hadoop/slave,刪除預設的localhost,新增slaveo1,slave02

[[email protected] hadoop]# echo -e "slaveo1\nslave02" > /export/servers/hadoop/hadoop-2.7.3/etc/hadoop/slaves

啟動,只在master執行,格式化

[[email protected] hadoop]# cd /export/servers/hadoop/hadoop-2.7.3/bin/

[[email protected] bin]# ./hadoop namenode -format

[[email protected] bin]# cd /export/servers/hadoop/hadoop-2.7.3/sbin/

[[email protected] sbin]#  ./start-all.sh

驗證:

master主機上執行

[[email protected] sbin]# jps

顯示資訊:

1856 SecondaryNameNode 1669 NameNode 2008 ResourceManager 2265 Jps

slave01主機上執行

[[email protected] ~]# jps

顯示資訊:

[[email protected] ~]# jps 1712 NodeManager 1605 DataNode 1833 Jps

slave02主機上執行

[[email protected] ~]# jps

顯示資訊:

[[email protected] ~]# jps 1812 NodeManager 1706 DataNode 1933 Jps

顯示如上資訊說明搭建成功,如果發現顯示資訊與上面不服,可能是格式兩次之後叢集中的clusterID不一致造成的:這樣需將clusterID改為一致:

在master上執行以下命令檢視:

[[email protected] sbin]# cat /hadoop/hdfs/name/current/VERSION

在slave01或slave02上執行以下命令檢視:

[[email protected] ~]# cat /hadoop/hdfs/name/current/VERSION

若不一致則用以下命令修改,只需要複製貼上改為一致即可:

vi /hadoop/hdfs/name/current/VERSION

瀏覽器訪問master的50070,比如http://192.168.25.128:50070,http://192.168.25.128:8080檢視

關閉hadoop叢集,執行命令:

[[email protected] sbin]#./start-all.sh