一、Hadoop2.x完全分散式叢集搭建
阿新 • • 發佈:2018-12-27
實驗環境規劃
192.168.1.101 cmaster0
192.168.1.102 cslave0
192.168.1.103 cslave1
三臺伺服器都是安裝的CentOS6.8作業系統
配置/etc/hosts
[[email protected] ~]# vi /etc/hosts
192.168.1.101 cmaster0
192.168.1.102 cslave0
192.168.1.103 cslave1
修改字符集
[[email protected] ~]# vi /etc/sysconfig/i18n #LANG="zh_CN.UTF-8" LANG="en_US.UTF-8"
關閉selinux
[[email protected] ~]# vi /etc/sysconfig/selinux # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. #SELINUX=enforcing SELINUX=disabled # SELINUXTYPE= can take one of these two values: # targeted - Targeted processes are protected, # mls - Multi Level Security protection. #SELINUXTYPE=targeted
禁用iptables、ip6tables
[[email protected] ~]# service iptables stop
[[email protected] ~]# service ip6tables stop
[[email protected] ~]# chkconfig iptables off
[[email protected] ~]# chkconfig ip6tables off
[[email protected] ~]# chkconfig iptables --list
[[email protected] ~]# chkconfig ip6tables --list
建立hadoop賬號
[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop
配置sudo許可權
[[email protected] ~]# vi /etc/sudoers
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
##/etc/sudoers檔案時readonly的, 在退出的時候wq!就可以了
安裝JDK
[[email protected] software]# tar -zxf jdk-7u79-linux-x64.gz -C /opt/module/
[[email protected] software]# vi /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
[[email protected] jdk1.7.0_79]# source /etc/profile
建立hadoop安裝目錄
[[email protected] ~]# mkdir -p /opt/module
[[email protected] software]# tar -zxf hadoop-2.7.2.tar.gz -C /opt/module/
#以上操作叢集中每一臺機器都要做,我這裡是使用虛擬機器,就在同一臺機器做了,複製幾臺虛擬機器
[[email protected] software]# vi /etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[[email protected] jdk1.7.0_79]# source /etc/profile
配置ssh免密碼
[[email protected] ~]$ cd
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] .ssh]$ cd ~/.ssh
[[email protected] .ssh]$ cp id_rsa.pub authorized_keys
##分發ssh公鑰(所有節點都要做),這一步需要在叢集中每臺機器都做
##把各個節點的authorized_keys的內容互相拷貝加入到對方的此檔案中,然後就可以免密碼彼此ssh連入
測試ssh(所有節點都要做)
# ssh cmaster0 date
# ssh cslave0 date
# ssh cslave1 date
叢集部署規劃
/ | cmaster0 | cslave0 | cslave1 |
---|---|---|---|
HDFS | DataNode | DataNode | DataNode |
HDFS | NameNode | / | SecondaryNameNode |
YARN | NodeManager | NodeManager | NodeManager |
YARN | / | ResourceManager | / |
安裝Hadoop2.x
下載hadoop2.x幵解壓
[[email protected] module]$ cd hadoop-2.7.2/
[[email protected] hadoop-2.7.2]$ mkdir -p data/tmp
修改配置檔案
涉及到的配置檔案有8個:
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
$HADOOP_HOME/etc/hadoop/mapred-env.sh
$HADOOP_HOME/etc/hadoop/yarn-env.sh
$HADOOP_HOME/etc/hadoop/core-site.xml
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
$HADOOP_HOME/etc/hadoop/mapred-site.xml
$HADOOP_HOME/etc/hadoop/yarn-site.xml
$HADOOP_HOME/etc/hadoop/slaves
以上個別檔案預設不存在的,可以複製相應的template檔案獲得
配置hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置mapred-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置yarn-env.sh
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/opt/module/jdk1.7.0_79
配置core-site.xml檔案
[[email protected] hadoop]$ vi core-site.xml
<configuration>
<!-- 指定HADOOP所使用的檔案系統schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://cmaster0:8020</value>
</property>
<!-- 指定hadoop執行時產生檔案的儲存目錄 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
</configuration>
配置hdfs-site.xml檔案
[[email protected] hadoop]$ vi hdfs-site.xml
<configuration>
<!-- 指定HDFS副本的數量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- secondarynamenode 程序所在伺服器地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>cslave1:50090</value>
</property>
</configuration>
配置mapred-site.xml檔案
[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ vi mapred-site.xml
<configuration>
<!-- 指定mr執行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml檔案
[[email protected] hadoop]$ vi yarn-site.xml
<configuration>
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cslave0</value>
</property>
<!-- reducer獲取資料的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
配置slaves檔案
[[email protected] hadoop]$ vi slaves
cmaster0
cslave0
cslave1
向各節點複製hadoop
[[email protected] hadoop]$ scp -r ./hadoop-2.7.2/ cslave0:/opt/module/
[[email protected] hadoop]$ scp -r ./hadoop-2.7.2/ cslave1:/opt/module/
[[email protected] hadoop]$ scp -r ./jdk1.7.0_79/ cslave0:/opt/module/
[[email protected] hadoop]$ scp -r ./jdk1.7.0_79/ cslave1:/opt/module/
格式化namenode
[[email protected] hadoop-2.6.4]$ hdfs namenode -format
啟動Hadoop叢集
一定要在cmaster0上啟動start-dfs.sh,因為NameNode是配置在cmaster0上的
[[email protected] sbin]$ start-dfs.sh
[[email protected] tmp]$ jps
2973 DataNode
3225 Jps
2876 NameNode
[[email protected] tmp]$ jps
2734 Jps
2647 DataNode
[[email protected] tmp]$ jps
2815 Jps
2728 SecondaryNameNode
2655 DataNode
一定要在cslave0上啟動start-yarn.sh,因為resourcemanager是配置在cslave0上的
- 注意:Namenode和ResourceManger如果不是同一臺機器,不能在NameNode上啟動 yarn,應該在ResouceManager所在的機器上啟動yarn。
[[email protected] sbin]$ start-yarn.sh
[[email protected] tmp]$ jps
3525 Jps
2973 DataNode
3375 NodeManager
2876 NameNode
[[email protected] tmp]$ jps
3792 Jps
2647 DataNode
3122 ResourceManager
3230 NodeManager
[[email protected] tmp]$ jps
2970 Jps
2728 SecondaryNameNode
2860 NodeManager
2655 DataNode
web管理頁面
#NameNode所在伺服器上檢視:
http://192.168.1.101:50070 (HDFS管理介面)
#ResourceManager所在伺服器上檢視:
http://192.168.1.102:8088 (MR管理介面)
#檢視SecondaryNameNode資訊
http://192.168.1.103:50090/status.html
$HADOOP_HOME/sbin管理指令碼介紹
命令 | 解釋 |
---|---|
hadoop-daemon.sh | 可以單獨啟動namenode、datanode、secondarynamenode(hadoop-daemon.sh start,stop namenode,datanode,secondarynamenode) |
hadoop-daemons.sh | 在叢集中每臺機器執行hadoop-daemon.sh指令碼 |
yarn-daemon.sh | 可以單獨啟動resourcemanager、nodemanager(yarn-daemon.sh start,stop resourcemanager,nodemanager) |
yarn-daemons.sh | 在叢集中每臺機器執行yarn-daemon.sh指令碼 |
start-all.sh | 先執行start-dfs.sh,再執行start-yarn.sh(指令碼已經過時) |
stop-all.sh | 同start-all.sh指令碼 |
start-dfs.sh | 啟動hdfs(NameNode、SecondaryNameNode、DataNode) |
stop-dfs.sh | 同start-dfs.sh指令碼 |
start-yarn.sh | 啟動yarn(ResourceManager、NodeManager) |
stop-yarn.sh | 同start-yarn.sh指令碼 |
官方wordcount案例
在home目錄下面準備一個輸入檔案
[[email protected] ~]$ cd
[[email protected] ~]$ mkdir wcinput
[[email protected] ~]$ cd wcinput/
[[email protected] wcinput]$ touch wc.input
[[email protected] wcinput]$ vi wc.input
[[email protected] wcinput]$ cat wc.input
hadoop yarn
hadoop mapreduce
zhangsh
zhangyu
將測試檔案內容上傳到檔案系統上
[[email protected] wcinput]$ hadoop fs -mkdir -p /user/hadoop/wcinput
[[email protected] wcinput]$ hadoop fs -put /home/hadoop/wcinput/wc.input /user/hadoop/wcinput
[[email protected] wcinput]$ hadoop fs -cat /user/hadoop/wcinput/wc.input
hadoop yarn
hadoop mapreduce
zhangsh
zhangyu
在Hdfs上執行mapreduce程式
[[email protected] wcinput]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/hadoop/wcinput /user/hadoop/wcoutput
[[email protected] wcinput]$ hadoop fs -ls -R /user
drwxr-xr-x - hadoop supergroup 0 2018-12-21 08:03 /user/hadoop
drwxr-xr-x - hadoop supergroup 0 2018-12-21 08:00 /user/hadoop/wcinput
-rw-r--r-- 3 hadoop supergroup 46 2018-12-21 08:00 /user/hadoop/wcinput/wc.input
drwxr-xr-x - hadoop supergroup 0 2018-12-21 08:03 /user/hadoop/wcoutput
-rw-r--r-- 3 hadoop supergroup 0 2018-12-21 08:03 /user/hadoop/wcoutput/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 48 2018-12-21 08:03 /user/hadoop/wcoutput/part-r-00000
[[email protected] wcinput]$ hadoop fs -cat /user/hadoop/wcoutput/part-r-00000
hadoop 2
mapreduce 1
yarn 1
zhangsh 1
zhangyu 1