hadoop2.8.2分散式叢集實戰
環境
CentOS6.5+jdk1.8+Hadoop2.8.2;
概述
本文件搭建三臺hadoop的叢集,其中一臺為Master,兩臺為Slaves。
Master上的程序:NameNode,SecondaryNameNode,ResourceManager。
Slaves上的程序:DataNode,NodeManager。
準備環境
設定hostname
我們定義三臺伺服器的host那麼為hadoop1,hadoop2,hadoop3。這樣在下面的配置中我們就使用hostname來代替ip,更加一目瞭然。
伺服器1命令如下:
[root@chu home]# sudo hostname hadoop1
[root@hadoop1 home]#
然後分別在另外兩臺伺服器上,執行hostname hadoop2和hostname hadoop3。
這只是臨時的修改hostname,重啟之後就失效了,如果你想永久的修改hostname,請繼續修改/etc/sysconfig/network檔案中的HOSTNAME屬性的值。
修改hosts檔案
開啟/etc/hosts檔案,在末尾新增下面內容:
192.168.1.235 hadoop1
192.168.1.237 hadoop2
192.168.1.239 hadoop3
192.168.1.235,192.168.1.237,192.168.1.239是三臺伺服器的ip,請正確填寫你的伺服器ip。
設定伺服器之間無密碼通訊
分別在三臺伺服器上執行下面的命令:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
如下所示:
[root@hadoop2 hadoop]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
7 e:52:41:43:9d:07:a8:b2:15:07:86:1c:0d:f5:8f:59 root@hadoop2
The key's randomart image is:
+--[ RSA 2048]----+
| .o*+o+o.o |
| o.ooo.o . |
| +o E. |
| . o * |
| +S + . |
| .. . |
| o . |
| o |
| |
+-----------------+
此命令會在/root/.ssh目錄下生成私鑰和公鑰檔案,如下所示:
[root@hadoop2 home]# ll /root/.ssh
total 16
-rw-------. 1 root root 1182 Nov 2 02:17 authorized_keys
-rw-------. 1 root root 1675 Nov 2 01:47 id_rsa
-rw-r--r--. 1 root root 394 Nov 2 01:47 id_rsa.pub
-rw-r--r--. 1 root root 1197 Nov 1 01:57 known_hosts
[root@hadoop2 home]#
首先把hadoop2和hadoop3伺服器上的公鑰拷貝到hadoop1的/root/.ssh目錄下,重新命名為hadoop1.pub和hadoop2.pub。這裡使用scp命令,過程之中需要你輸入目標伺服器密碼。當然你可以其他方法拷貝,只要符合要求就行,
進入hadoop2伺服器,執行下面命令:
[root@hadoop2 home]# scp /root/.ssh/id_rsa.pub hadoop1:/root/.ssh/hadoop2.pub
進入hadoop3伺服器,執行下面命令:
[root@hadoop3 home]# scp /root/.ssh/id_rsa.pub hadoop1:/root/.ssh/hadoop3.pub
然後再hadoop1伺服器中檢視.ssh中的檔案如下:
[root@hadoop1 home]# ll /root/.ssh
total 24
-rw-------. 1 root root 1182 Nov 1 17:59 authorized_keys
-rw-r--r--. 1 root root 394 Nov 1 17:57 hadoop2.pub
-rw-r--r--. 1 root root 394 Nov 1 17:57 hadoop3.pub
-rw-------. 1 root root 1675 Nov 1 17:46 id_rsa
-rw-r--r--. 1 root root 394 Nov 1 17:46 id_rsa.pub
-rw-r--r--. 1 root root 2370 Oct 31 18:41 known_hosts
在hadoop1伺服器上將id_rsa.pub,hadoop2.pub,hadoop3.pub都寫入認證檔案authorized_keys中。命令如下:
[root@hadoop1 home]# cd /root/.ssh
[root@hadoop1 .ssh]# cat *.pub > authorized_keys
[root@hadoop1 .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAp/r5sJ5XXrUPncNC9n6lOkcJanNuu9KX9nDMVMw6i0Q0Lq64mib3n8HtSuh6ytc1zaCA3nP3YApofcf7rM406IlxFcrNAA77UfMTw7EjyAhwpaN/045/MRd/yklqEDvtzeQSaWfzns3WFrc3ELF41cWh2k6wR9MCdsJyUfUG7SukCw3BRzHvqFhBV2sMnCzGLUxnAvOklNqmLtQ9LbWKOjv47GQBrMHu16awwru5frSlcnbO0pJa+c/enri2Sm6LfeskqyOlDeTgcdXh/97hqAgAetMh893an0X9hlqoa6zq6ybhOmgCSDYCD7RpzQpoB+o4qWzkGEixopIQ8otjbw== root@hadoop2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5EbNzzX41YXY0XFU24gqypU8dQqYHfRRJdUBAkf1AGc6S0K+FMaVdlLhWvWDE5+4nVKNQmXe22cRDLel/9PqnNStcRBnHQazKEICNN11FnuixMZKkDcxx5Ikcv01ToGf3KBupFgxnGPvrpVOUyWZ8TH4JVJNKuPA9AbWRIvpdZ7Y04OYLphjduGQq+8zDuwlPn4epEHXtIaLHomdI9Rt4Qhufq8c6ZnwC3DsR8r1XTX0x+nngpgKMyspt3h7tGysJr4nfnG5gt68L3X8H5Yl0hLuxPJqDEORVRTFm3ag/HV1UR+BXpOBeYjDsMKKLYebBVivdAcmWJhsSlhvS5Q2Xw== root@hadoop3
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAygyRKgUIxj1wjkvwfYP3QIoZ1gpP5gayx5z4b1nxuu7cD3bu7f2hLAve3cwcbDpkjeLP8Lj2Sz6VdzIvhDvVF+ZN7qwx8bsmPElVoiiZJecxuYt6wizg8IPxLf6NQknfxkKEv0QIeSlN8IQlXVaCz04FiYmFvincPeyvszTXTXcVf6YWXHNbqtm6p6t4kxf4rpm9/lWR8VapzaPM3/669fqrfAkIjkUGEdzD3wUWpHtgGpmNdAW6My3lyWhYTm4INftpDzsL47lXo1UNGwvlhaLneMdGQP/1+t0k3wsNzggzLQSV8GN+jy0jIbSsc6HlIk663OLKz6vY+fccGlE30Q== root@hadoop1
看到authorized_keys檔案中存在三行認證資訊,三行末尾分別是root@hadoop1,root@hadoop2和root@hadoop3,說明寫入成功。然後把authorized_keys檔案的屬性改為600,命令如下:
[root@hadoop1 .ssh]# chmod 600 authorized_keys
將這個認證檔案拷貝到hadoop2和hadoop3中,覆蓋掉原來的認證檔案,過程中需要輸出目標伺服器密碼,命令如下:
[root@hadoop1 .ssh]# scp authorized_keys hadoop2:/root/.ssh
[root@hadoop1 .ssh]# scp authorized_keys hadoop3:/root/.ssh
分別在三臺伺服器上驗證無密碼通訊,下面展示了再hadoop1上驗證,如下所示:
[root@hadoop1 home]# ssh hadoop2
Last login: Thu Nov 2 02:18:57 2017 from hadoop3
[root@hadoop2 ~]# exit
logout
Connection to hadoop2 closed.
[root@hadoop1 home]# ssh hadoop3
Last login: Wed Nov 1 03:18:41 2017 from hadoop2
[root@hadoop3 ~]# exit
logout
Connection to hadoop3 closed.
[root@hadoop1 home]#
看到hadoop1不需要輸入密碼就可以和hadoop2和hadoop3通訊。當然在hadoop2和hadoop3也需要驗證,這裡不再贅述。到這裡無密碼通訊就結束了。
安裝JDK
從oracle下載jdk,三臺伺服器上都要安裝好JDK,自定義目錄解壓就OK了。注意三臺伺服器的JDK安裝目錄要相同,否則後面會麻煩。配置環境變數,開啟/etc/profile檔案,在檔案末尾加上如下命令:
export JAVA_HOME=/usr/java/jdk1.8
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
環境變數生效:
source /etc/profile
然後使用 java -version命令驗證環境變數設定是否生效。如下:
[root@hadoop1 home]# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
安裝Hadoop
從Apache下載hadoop2.8.2版本,解壓為hadoop,我們現在hadoop1伺服器上安裝並配置hadoop,配置完成之後,複製到另外兩臺伺服器上就可以了。
配置JAVA_HOME和HADOOP_PREFIX,開啟hadoop目錄下etc/hadoop/hadoop-env.sh檔案,修改引數為正確的值,如下:
[root@hadoop1 hadoop]# more hadoop-env.sh
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/home/software/jdk1.8 #請修改為你的目錄
export HADOOP_PREFIX=/home/software/hadoop #請修改為你的目錄
修改配置
主要是修改slaves、core-site.xml,hdfs-site.xml,yarn-site.xml,maped-site.xml。檔案的內容如下:
slaves檔案
hadoop2
hadoop3
core-site.xml檔案
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定hadoop臨時目錄,自行建立 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/software/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/software/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/software/hadoop/hdfs/data</value>
</property>
</configuration>
yarn-site.xml檔案
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
</configuration>
maped-site.xml檔案
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:19888</value>
</property>
</configuration>
配置修改完成之後,需要把hadoop的整個目錄向hadoop2和hadoop3中都複製一份,複製過程你隨意選擇,且要注意目錄必須和hadoop1伺服器的目錄相同。
開放埠
上面步驟中用到了一些埠,因為是測試環境為了簡單起見,統一開放用到的全部埠。三臺伺服器都要開放,分別在三臺伺服器上進行下面的操作。開啟/etc/sysconfig/iptables檔案,修改內容如下:
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8088 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8030 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8031 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8032 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8033 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50010 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
並執行下面命令生效:
[root@hadoop3 hadoop]# /etc/init.d/iptables restart
再次重申:三臺伺服器都要開放埠
啟動叢集
啟動叢集之前我們需要格式化namenode,命令如下,這條命令只需要在hadoop1上執行。:
[root@hadoop1 hadoop]# bin/hdfs namenode -format
最後使用start-all.sh啟動叢集,命令如下,這條命令只需要在hadoop1上執行:
[root@hadoop1 hadoop]# sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/10/31 18:50:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop1]
hadoop1: starting namenode, logging to /home/software/hadoop/logs/hadoop-root-namenode-hadoop1.out
hadoop2: starting datanode, logging to /home/software/hadoop/logs/hadoop-root-datanode-hadoop2.out
hadoop3: starting datanode, logging to /home/software/hadoop/logs/hadoop-root-datanode-hadoop3.out
Starting secondary namenodes [hadoop1]
hadoop1: starting secondarynamenode, logging to /home/software/hadoop/logs/hadoop-root-secondarynamenode-hadoop1.out
17/10/31 18:50:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/software/hadoop/logs/yarn-root-resourcemanager-hadoop1.out
hadoop2: starting nodemanager, logging to /home/software/hadoop/logs/yarn-root-nodemanager-hadoop2.out
hadoop3: starting nodemanager, logging to /home/software/hadoop/logs/yarn-root-nodemanager-hadoop3.out
看到輸出中包括所有節點的啟動以及日誌檔案的位置。使用jps檢視三臺伺服器上的節點。
hadoop1上執行jps如下:
[root@hadoop1 hadoop]# jps
13393 SecondaryNameNode
13547 ResourceManager
14460 Jps
13199 NameNode
hadoop2上執行jps如下:
[root@hadoop2 home]# jps
4497 NodeManager
4386 DataNode
5187 Jps
hadoop3上執行jps如下:
[root@hadoop3 hadoop]# jps
23474 Jps
4582 NodeManager
4476 DataNode
使用瀏覽器檢視叢集資訊:
搞定