1. 程式人生 > >一、Hadoop2.x完全分散式叢集搭建

一、Hadoop2.x完全分散式叢集搭建

實驗環境規劃

192.168.1.101 cmaster0
192.168.1.102 cslave0
192.168.1.103 cslave1
三臺伺服器都是安裝的CentOS6.8作業系統

配置/etc/hosts

[[email protected] ~]# vi /etc/hosts
192.168.1.101   cmaster0
192.168.1.102   cslave0
192.168.1.103   cslave1

修改字符集

[[email protected] ~]# vi /etc/sysconfig/i18n 
#LANG="zh_CN.UTF-8"	
LANG="en_US.UTF-8"

關閉selinux

[[email protected] ~]# vi /etc/sysconfig/selinux 

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
#SELINUXTYPE=targeted 

禁用iptables、ip6tables

[[email protected] ~]# service iptables stop
[[email protected] ~]# service ip6tables stop
[[email protected] ~]# chkconfig iptables off
[[email protected] ~]# chkconfig ip6tables off
[[email protected] ~]# chkconfig iptables --list
[[email protected]
~]# chkconfig ip6tables --list

建立hadoop賬號

[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop

配置sudo許可權

[[email protected] ~]# vi /etc/sudoers
root    ALL=(ALL)       ALL
hadoop    ALL=(ALL)       ALL
##/etc/sudoers檔案時readonly的, 在退出的時候wq!就可以了

安裝JDK

[[email protected] software]# tar -zxf jdk-7u79-linux-x64.gz -C /opt/module/
[[email protected] software]# vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
[[email protected] jdk1.7.0_79]# source /etc/profile

建立hadoop安裝目錄

[[email protected] ~]# mkdir -p /opt/module
[[email protected] software]# tar -zxf hadoop-2.7.2.tar.gz -C /opt/module/
#以上操作叢集中每一臺機器都要做,我這裡是使用虛擬機器,就在同一臺機器做了,複製幾臺虛擬機器
[[email protected] software]# vi /etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[[email protected] jdk1.7.0_79]# source /etc/profile

配置ssh免密碼

[[email protected] ~]$ cd
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] .ssh]$ cd ~/.ssh
[[email protected] .ssh]$ cp id_rsa.pub authorized_keys
##分發ssh公鑰(所有節點都要做),這一步需要在叢集中每臺機器都做
##把各個節點的authorized_keys的內容互相拷貝加入到對方的此檔案中,然後就可以免密碼彼此ssh連入
測試ssh(所有節點都要做)
# ssh cmaster0 date
# ssh cslave0 date
# ssh cslave1 date

叢集部署規劃

/ cmaster0 cslave0 cslave1
HDFS DataNode DataNode DataNode
HDFS NameNode / SecondaryNameNode
YARN NodeManager NodeManager NodeManager
YARN / ResourceManager /

安裝Hadoop2.x

下載hadoop2.x幵解壓
[[email protected] module]$ cd hadoop-2.7.2/
[[email protected] hadoop-2.7.2]$ mkdir -p data/tmp
修改配置檔案
涉及到的配置檔案有8個:
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
$HADOOP_HOME/etc/hadoop/mapred-env.sh
$HADOOP_HOME/etc/hadoop/yarn-env.sh
$HADOOP_HOME/etc/hadoop/core-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
$HADOOP_HOME/etc/hadoop/slaves
以上個別檔案預設不存在的,可以複製相應的template檔案獲得
配置hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置mapred-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置yarn-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置core-site.xml檔案
[[email protected] hadoop]$ vi core-site.xml
<configuration>
	<!-- 指定HADOOP所使用的檔案系統schema(URI),HDFS的老大(NameNode)的地址 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://cmaster0:8020</value>
	</property>
	<!-- 指定hadoop執行時產生檔案的儲存目錄 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/module/hadoop-2.7.2/data/tmp</value>
  	</property>
</configuration>
配置hdfs-site.xml檔案
[[email protected] hadoop]$ vi hdfs-site.xml
<configuration>
	<!-- 指定HDFS副本的數量 -->
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<!-- secondarynamenode 程序所在伺服器地址 -->
	<property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>cslave1:50090</value>
    </property>
</configuration>
配置mapred-site.xml檔案
[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ vi mapred-site.xml
<configuration>
	<!-- 指定mr執行在yarn上 -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
配置yarn-site.xml檔案
[[email protected] hadoop]$ vi yarn-site.xml
<configuration>
	<!-- 指定YARN的老大(ResourceManager)的地址 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>cslave0</value>
	</property>
	<!-- reducer獲取資料的方式 -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
</configuration>
配置slaves檔案
[[email protected] hadoop]$ vi slaves
cmaster0
cslave0
cslave1

向各節點複製hadoop

[[email protected] hadoop]$ scp -r ./hadoop-2.7.2/ cslave0:/opt/module/
[[email protected] hadoop]$ scp -r ./hadoop-2.7.2/ cslave1:/opt/module/
[[email protected] hadoop]$ scp -r ./jdk1.7.0_79/ cslave0:/opt/module/
[[email protected] hadoop]$ scp -r ./jdk1.7.0_79/ cslave1:/opt/module/

格式化namenode

[[email protected] hadoop-2.6.4]$ hdfs namenode -format

啟動Hadoop叢集

一定要在cmaster0上啟動start-dfs.sh,因為NameNode是配置在cmaster0上的
[[email protected] sbin]$ start-dfs.sh
[[email protected] tmp]$ jps
2973 DataNode
3225 Jps
2876 NameNode

[[email protected] tmp]$ jps
2734 Jps
2647 DataNode

[[email protected] tmp]$ jps
2815 Jps
2728 SecondaryNameNode
2655 DataNode
一定要在cslave0上啟動start-yarn.sh,因為resourcemanager是配置在cslave0上的
  • 注意:Namenode和ResourceManger如果不是同一臺機器,不能在NameNode上啟動 yarn,應該在ResouceManager所在的機器上啟動yarn。
[[email protected] sbin]$ start-yarn.sh
[[email protected] tmp]$ jps
3525 Jps
2973 DataNode
3375 NodeManager
2876 NameNode

[[email protected] tmp]$ jps
3792 Jps
2647 DataNode
3122 ResourceManager
3230 NodeManager

[[email protected] tmp]$ jps
2970 Jps
2728 SecondaryNameNode
2860 NodeManager
2655 DataNode

web管理頁面

#NameNode所在伺服器上檢視:
http://192.168.1.101:50070 (HDFS管理介面)

#ResourceManager所在伺服器上檢視:
http://192.168.1.102:8088   (MR管理介面)

#檢視SecondaryNameNode資訊
http://192.168.1.103:50090/status.html

$HADOOP_HOME/sbin管理指令碼介紹

命令 解釋
hadoop-daemon.sh 可以單獨啟動namenode、datanode、secondarynamenode(hadoop-daemon.sh start,stop namenode,datanode,secondarynamenode)
hadoop-daemons.sh 在叢集中每臺機器執行hadoop-daemon.sh指令碼
yarn-daemon.sh 可以單獨啟動resourcemanager、nodemanager(yarn-daemon.sh start,stop resourcemanager,nodemanager)
yarn-daemons.sh 在叢集中每臺機器執行yarn-daemon.sh指令碼
start-all.sh 先執行start-dfs.sh再執行start-yarn.sh(指令碼已經過時)
stop-all.sh 同start-all.sh指令碼
start-dfs.sh 啟動hdfs(NameNode、SecondaryNameNode、DataNode)
stop-dfs.sh 同start-dfs.sh指令碼
start-yarn.sh 啟動yarn(ResourceManager、NodeManager)
stop-yarn.sh 同start-yarn.sh指令碼

官方wordcount案例

在home目錄下面準備一個輸入檔案
[[email protected] ~]$ cd
[[email protected] ~]$ mkdir wcinput
[[email protected] ~]$ cd wcinput/
[[email protected] wcinput]$ touch wc.input
[[email protected] wcinput]$ vi wc.input 
[[email protected] wcinput]$ cat wc.input 
hadoop yarn
hadoop mapreduce 
zhangsh
zhangyu
將測試檔案內容上傳到檔案系統上
[[email protected] wcinput]$ hadoop fs -mkdir -p /user/hadoop/wcinput
[[email protected] wcinput]$ hadoop fs -put /home/hadoop/wcinput/wc.input /user/hadoop/wcinput
[[email protected] wcinput]$ hadoop fs -cat /user/hadoop/wcinput/wc.input
hadoop yarn
hadoop mapreduce 
zhangsh
zhangyu
在Hdfs上執行mapreduce程式
[[email protected] wcinput]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/hadoop/wcinput /user/hadoop/wcoutput
[[email protected] wcinput]$ hadoop fs -ls -R /user
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:03 /user/hadoop
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:00 /user/hadoop/wcinput
-rw-r--r--   3 hadoop supergroup         46 2018-12-21 08:00 /user/hadoop/wcinput/wc.input
drwxr-xr-x   - hadoop supergroup          0 2018-12-21 08:03 /user/hadoop/wcoutput
-rw-r--r--   3 hadoop supergroup          0 2018-12-21 08:03 /user/hadoop/wcoutput/_SUCCESS
-rw-r--r--   3 hadoop supergroup         48 2018-12-21 08:03 /user/hadoop/wcoutput/part-r-00000
[[email protected] wcinput]$ hadoop fs -cat /user/hadoop/wcoutput/part-r-00000
hadoop	2
mapreduce	1
yarn	1
zhangsh	1
zhangyu	1