1. 程式人生 > >Hadoop搭建詳細過程

Hadoop搭建詳細過程

一、hadoop環境安裝

1】建立hadoop使用者並切換到hadoop使用者

[[email protected] ~]# useradd  hadoop

[[email protected] ~]# id hadoop

uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)

[[email protected] ~]# su - hadoop

2】下載hadoop和jdk並解壓

注:兩個軟體包都放到hadoop家目錄底下。

[[email protected] ~]$ tar zxf hadoop-2.7.3.tar.gz

[[email protected]

~]$ tar zxf jdk-7u79-linux-x64.tar.gz

注:為了方便這裡做軟連線,方便配置。

[[email protected] ~]$ ln -s jdk1.7.0_79/ jdk

[[email protected] ~]$ ln -s hadoop-2.7.3 hadoop

3】配置hadoop環境變數

[[email protected] hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[[email protected] hadoop]$ vim hadoop-env.sh 
 24 # The java implementation to use.
 25 export JAVA_HOME=/home/hadoop/jdk
[
[email protected]
~]$ cat /etc/hosts 172.25.37.1 server1 注:必須有上面那一項,要不然執行會報錯,IP為你hadoop伺服器IP [[email protected] ~]$ cat .bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin:/home/hadoop/jdk/bin export PATH

4】第一次啟動hadoop

[[email protected] hadoop]$ cd /home/hadoop/hadoop/bin/

[[email protected] bin]$ ./hadoop         

注:執行指令碼啟動,類似與初始化,看是否報錯。

[[email protected] ~]$ cd hadoop

[[email protected] hadoop]$ mkdir input

[[email protected] hadoop]$ cp etc/hadoop/*.xml input/

執行hadoop自帶的mapreduce Demo

注:MapReduce是一種程式設計模型,用於大規模資料集(大於1TB)的並行運算

[[email protected]]$ bin/hadoop jar    \

share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar \ grep input output  'dfs[a-z.]+'

檢視輸出檔案

[[email protected] hadoop]$ cat output/*

1 dfsadmin

[[email protected] hadoop]$ pwd

/home/hadoop/hadoop

[[email protected] hadoop]$ ls

bin  etc  include  input  lib  libexec  LICENSE.txt  NOTICE.txt  output  README.txt  sbin  share

二、偽分散式構建

1】配置core-site.xml檔案

[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim core-site.xml 
 17 <!-- Put site-specific property overrides in this file. -->
 18 
 19 <configuration>
 20         <property>
 21                 <name>fs.defaultFS</name>
 22                 <value>hdfs://172.25.37.1:9000</value>
 23         </property> 
 24 </configuration>
注:fs.defaultFS引數配置的是HDFS的地址

2】配置 hdfs-site.xml 檔案

[[email protected] hadoop]$ vim hdfs-site.xml 
17 <!-- Put site-specific property overrides in this file. -->
 18 
 19 <configuration>
 20         <property>
 21                 <name>dfs.replication</name>
 22                 <value>1</value>
 23         </property>
 24 </configuration>
注:dfs.replication配置的是HDFS儲存時的備份數量,因為這裡是偽分散式環境只有一個節點,所以這裡設定為1。

3】配置ssh免密

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
將下面檔案中的localhost改為本機IP
[[email protected] hadoop]$ vim slaves
172.25.37.1

4】格式化檔案系統

[[email protected] hadoop]$ bin/hdfs namenode -format

注:格式化是對HDFS這個分散式檔案系統中的DataNode進行分塊,統計所有分塊後的初始元資料的儲存在NameNode中。

5】啟動HDFS

[[email protected] hadoop]$ ./sbin/start-dfs.sh
Starting namenodes on [server1]
server1: namenode running as process 2496. Stop it first.
172.25.37.1: datanode running as process 2164. Stop it first.
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 9b:19:24:43:5d:09:3a:12:97:94:99:f4:61:dc:3d:e2.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-server1.out

6】檢視程序,四個程序都啟動表示成功

[[email protected] hadoop]$ jps
3151 Jps
2164 DataNode
3042 SecondaryNameNode
2496 NameNode
檢視埠:
[[email protected] hadoop]$ netstat -antlp | grep 50070
tcp        0      0 0.0.0.0:50070 

7】瀏覽器測試

 輸入: 172.25.37.1:50070

檢視datanode:

通過命令方式檢視:

[[email protected] hadoop]$ ./bin/hdfs dfsadmin -report
Configured Capacity: 14309232640 (13.33 GB)
Present Capacity: 11614142464 (10.82 GB)
DFS Remaining: 11614113792 (10.82 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (1):

Name: 172.25.37.1:50010 (server1)
Hostname: server1
Decommission Status : Normal
Configured Capacity: 14309232640 (13.33 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 2695090176 (2.51 GB)
DFS Remaining: 11614113792 (10.82 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 31 22:22:11 CST 2018

建立目錄用於上傳檔案:

[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ bin/hdfs dfs -put  ./input  /user/hadoop/
[[email protected] hadoop]$ ./bin/hdfs dfs -ls 
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-05-31 22:24 input
[[email protected] hadoop]$ ./bin/hdfs dfs -ls input
Found 29 items
-rw-r--r--   1 hadoop supergroup       4436 2018-05-31 22:24 input/capacity-scheduler.xml
-rw-r--r--   1 hadoop supergroup       1335 2018-05-31 22:24 input/configuration.xsl
篇幅原因這裡只列出兩項
刪除input和output目錄重新執行mapreduce
[[email protected] hadoop]$ rm -fr input/ output
[[email protected] hadoop]$ ./bin/hadoop    jar  \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input  output
[[email protected] hadoop]$ ./bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2018-05-31 22:24 input
drwxr-xr-x   - hadoop supergroup          0 2018-05-31 22:30 output

瀏覽器訪問 http://172.25.20.1:50070/explorer.html,可以看到剛才建立的目錄。

8】讀取HDFS上的檔案內容

hadoop目錄執行

./bin/hdfs   dfs  -cat  你要檢視的檔案絕對路徑。

9】HDFS上下載檔案到本地

hadoop目錄執行

./bin/hdfs   dfs   -get   你要下載的檔案目錄。

待補充

三、完全分散式構建

環境:

物理機:rhel7.3  172.25.37.250/24 用於時間同步

Server1:rhel6.5  172.25.37.1/24

Server2:rhel6.5  172.25.37.2/24

Server3:rhel6.5  172.25.37.3/24

1】時間同步

真機將同步源設為百度IP

[[email protected] etc]# vim /etc/chrony.conf
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
注:這裡寫你的同步源
server 119.75.213.61 iburst
# Allow NTP client access from local network.
注:這裡寫你允許172.25/16網段同步時間
allow 172.25/16
在三個虛擬都作下面配置:
[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf 
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
注:將同步源設定為物理機
server 172.25.37.250 iburst
啟動ntpd
[[email protected] ~]# /etc/init.d/ntpd start
設定開機自啟動
[[email protected] ~]# chkconfig  ntpd on

2】配置ssh免密

[[email protected]  ~]# su - hadoop

[[email protected] ~]$ cd .ssh/

[[email protected] .ssh]$ ls

authorized_keys  id_rsa  id_rsa.pub  known_hosts

[[email protected] .ssh]$ ssh-copy-id 172.25.37.2

[[email protected] .ssh]$ ssh-copy-id 172.25.37.3

測試時,在server1上面執行下面的指令:

[[email protected] ~]$ ssh  172.25.37.2

[[email protected] ~]$ ssh  172.25.37.3

注:兩次測試輸入yes後回車,不需要再輸入密碼錶示成功。

3】在三臺虛擬機器上面均建立hadoop使用者並且要求id 完全一致。

[[email protected] ~]$ id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]# useradd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]# useradd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)

4】 nfs配置檔案共享

1)安裝軟體

[[email protected] ~]# yum  install -y exportfs

2)配置共享目錄

[[email protected] ~]# vim /etc/exports

/home/hadoop    *(rw,anonuid=500,anongid=500)

3)檢視共享目錄資訊

[[email protected] ~]# exportfs -rv

exporting *:/home/hadoop

4)開啟服務

[[email protected] ~]# /etc/init.d/rpcbind start            ##先開 這個
Starting rpcbind:                                          [  OK  ]
[[email protected] ~]# /etc/init.d/nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]

5)在server2和server3上面安裝軟體,並掛載共享目錄

[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind start
Starting rpcbind:                                          [  OK  ]

[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind start
Starting rpcbind:                                          [  OK  ]

[[email protected] ~]# mount 172.25.37.1:/home/hadoop/  /home/hadoop/
[[email protected] ~]# df
Filesystem                   1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root  13973860  922768  12341256   7% /
tmpfs                           510200       0    510200   0% /dev/shm
/dev/vda1                       495844   33457    436787   8% /boot
172.25.20.1:/home/hadoop/     13974016 1932544  11331584  15% /home/hadoop

[[email protected] ~]# mount 172.25.37.1:/home/hadoop/  /home/hadoop/
[[email protected] ~]# df
Filesystem                   1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root  13973860  922768  12341256   7% /
tmpfs                           510200       0    510200   0% /dev/shm
/dev/vda1                       495844   33457    436787   8% /boot
172.25.20.1:/home/hadoop/     13974016 1932544  11331584  15% /home/hadoop

4】在三臺虛擬機器上面的/etc/hosts解析要相同,我的檔案內容如下:

[[email protected] hadoop]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.37.1 server1
172.25.37.2 server2
172.25.37.3 server3
清空之前測試時的檔案
[[email protected] ~]# rm -fr /tmp/*
[[email protected] ~]# rm -fr /tmp/*
[[email protected] ~]# rm -fr /tmp/*

[[email protected] ~]# su - hadoop
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ ls
bin  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
注:如果之前測試時沒有關閉hdfs服務,那麼現在關閉。
[[email protected] hadoop]$ sbin/stop-dfs.sh  

5】配置分散式,同樣需要修改兩個檔案

[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim core-site.xml 
<configuration>
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://172.25.37.1:9000</value>
</property>
</configuration>
這裡將叢集數量改為2
[[email protected] hadoop]$ vim hdfs-site.xml 
<configuration>
<property>
        <name>dfs.replication</name>
        <value>2</value>
</property>
</configuration>
將slaves檔案做如下配置,現在的datanode為server2和server3
[[email protected] hadoop]$ vim slaves
172.25.37.2
172.25.37.3
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop

6】初始化並開啟hdfs

[[email protected] hadoop]$ bin/hdfs namenode -format
[[email protected] hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
server1: namenode running as process 3463. Stop it first.
172.25.37.3: datanode running as process 1202. Stop it first.
172.25.37.2: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 3710. Stop it first.

7】瀏覽器測試

你可以看到兩個datanode

檢視程序
[[email protected] ~]$ jps
3463 NameNode
3710 SecondaryNameNode
4515 Jps

[[email protected] hadoop]# su - hadoop
[[email protected] ~]$ jps
1255 DataNode
1415 Jps

[[email protected] ~]# su - hadoop
[[email protected] ~]$ jps
1202 DataNode
1464 Jps

8】擴容與縮容

通過更改配置檔案hdfs-site.xml中value鍵值對中的數字以及values檔案中的IP,

[[email protected] hadoop]$ ./bin/hdfs  dfsadmin   -refreshNodes

9】配置yarn

[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[[email protected] hadoop]$ vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[[email protected] hadoop]$ vim yarn-site.xml 
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
</configuration>
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
啟動yarn:
[[email protected] hadoop]$ ./sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-server1.out
172.25.37.2: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-server2.out
172.25.37.3: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-server3.out
檢視程序:
[[email protected] hadoop]$ jps
3129 SecondaryNameNode
3375 ResourceManager
2939 NameNode
3632 Jps
[[email protected] hadoop]$ jps
1687 DataNode
1832 NodeManager
1930 Jps
[[email protected] hadoop]$ jps
1806 NodeManager
1904 Jps
1648 DataNode
注:如果server2和server3上面沒有顯示NodeManager,你可以在對應虛擬機器上面執行:
[[email protected] hadoop]$ ./sbin/yarn-daemon.sh start nodemanager

建立hadoop存檔:
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ mkdir input
[[email protected] hadoop]$ cp etc/hadoop/*.xml input/
[[email protected] hadoop]$ ./bin/hdfs dfs -put  input 
[[email protected] hadoop]$ ./bin/hadoop archive -archiveName test.har -p /user/hadoop/ input/*  input/
18/06/01 21:23:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:09 INFO mapreduce.JobSubmitter: number of splits:1

在瀏覽器上訪問172.25.37.1:8088/cluster

四、zookeeper實現hadoop高可用

環境:

rhel6.5   iptables stop && selinux disabled

Server1 :172.25.37.1/24

Server2 :172.25.37.2/24

Server3 :172.25.37.3/24

Server4 :172.25.37.4/24

Server5 :172.25.37.5/24

server1  && server5 --> HA

server 2、3、4 儲存節點

1】在虛擬機器server4、server5上面配置免密登陸並新增hadoop使用者

[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop   

[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)s

[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop      

[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)

[[email protected] ~]$ cd .ssh/
[[email protected] .ssh]$ ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
[[email protected] .ssh]$ ssh-copy-id 172.25.37.4    
[[email protected] .ssh]$ ssh-copy-id 172.25.37.5    

2】在server4、server5上面做時間同步 

[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf
server 172.25.37.250 iburst
[[email protected] ~]# /etc/init.d/ntpd start
Starting ntpd:                                             [  OK  ]

[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf
server 172.25.37.250 iburst
[[email protected] ~]# /etc/init.d/ntpd start
Starting ntpd:                                             [  OK  ]

3】在server4、server5上面通過nfs共享hadoop配置

[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind status
rpcbind (pid  1092) is running...
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/  /home/hadoop/
[[email protected] ~]# df
Filesystem                   1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root  13973860  922772  12341252   7% /
tmpfs                           510200       0    510200   0% /dev/shm
/dev/vda1                       495844   33457    436787   8% /boot
172.25.37.1:/home/hadoop/     13974016 1954560  11309568  15% /home/hadoop

[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind status
rpcbind (pid  1092) is running...
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/  /home/hadoop/
[[email protected] ~]# df
Filesystem                   1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root  13973860  922760  12341264   7% /
tmpfs                           510200       0    510200   0% /dev/shm
/dev/vda1                       495844   33457    436787   8% /boot
172.25.20.1:/home/hadoop/     13974016 1954560  11309568  15% /home/hadoop

4】在所有虛擬機器上面做如下解析:

[[email protected] hadoop]$ cat /etc/hosts

172.25.37.1 server1

172.25.37.2 server2

172.25.37.3 server3

172.25.37.4 server4

172.25.37.5 server5

5】清除之前實驗產生的資料

[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/stop-yarn.sh 
stopping yarn daemons
no resourcemanager to stop
172.25.37.3: no nodemanager to stop
172.25.37.2: no nodemanager to stop
no proxyserver to stop

[[email protected] hadoop]$ sbin/stop-dfs.sh 
Stopping namenodes on [server1]
server1: stopping namenode
172.25.37.3: stopping datanode
172.25.37.2: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
[[email protected] hadoop]$

[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*

5】將datanode數量改為3

[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim hdfs-site.xml 
<configuration>
<property>
        <name>dfs.replication</name>
                <value>3</value>
                    </property>
</configuration>

[[email protected] hadoop]$ vim slaves
172.25.37.2
172.25.37.3
172.25.37.4

6】配置zookeeper 叢集

1)解壓檔案

[[email protected] ~]$ tar zxf zookeeper-3.4.9.tar.gz

[[email protected] ~]$ cd zookeeper-3.4.9

[[email protected] zookeeper-3.4.9]$ cd conf/

[[email protected] conf]$ ls

configuration.xsl  log4j.properties  zoo_sample.cfg

2)複製配置檔案:

[[email protected] conf]$ cp zoo_sample.cfg zoo.cfg

3)主要做如下配置:

[[email protected] conf]$ vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
server.1=172.25.37.2:2888:3888
server.2=172.25.37.3:2888:3888
server.3=172.25.37.4:2888:3888
各節點配置檔案相同,並且需要在/tmp/zookeeper 目錄中建立 myid 檔案,寫入
一個唯一的數字,取值範圍在 1-255。比如:172.25.37.2 節點的 myid 檔案寫入數
字“1”,此數字與配置檔案中的定義保持一致,(server.1=172.25.37.2:2888:3888
)其它節點依次類推。
配置引數詳解:
clientPort
客戶端連線 server 的埠,即對外服務埠,一般設定為 2181 吧。
dataDir
儲存快照檔案 snapshot 的目錄。預設情況下,事務日誌也會儲存在這裡。建議同時配置參
數 dataLogDir, 事務日誌的寫效能直接影響 zk 效能。
tickTime
ZK 中的一個時間單元。ZK 中所有時間都是以這個時間單元為基礎,以毫秒計,用來調節
心跳和超時。例如,session 的最小超時時間是 2*tickTime。
dataLogDir
事務日誌輸出目錄。儘量給事務日誌的輸出配置單獨的磁碟或是掛載點,這將極大的提升
ZK 效能。
[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 1 > /tmp/zookeeper/myid

[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 2 > /tmp/zookeeper/myid

[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 3 > /tmp/zookeeper/myid

4)開啟服務

[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$

[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./
README.txt    zkCli.cmd     zkEnv.cmd     zkServer.cmd  
zkCleanup.sh  zkCli.sh      zkEnv.sh      zkServer.sh   
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$ 

[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$

4)檢視各節點看狀態

[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower
[[email protected] bin]$

[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader
[[email protected] bin]$

[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower
[[email protected] bin]$ 
你可以看到server3被選舉為leader

5)檢視程序

[[email protected] bin]$ jps
1658 QuorumPeerMain
1803 Jps

[[email protected] bin]$ jps
1766 Jps
1704 QuorumPeerMain

[[email protected] bin]$ jps
1245 QuorumPeerMain
1319 Jps

6)zookeeper的互動式介面

[[email protected] bin]$ pwd

/home/hadoop/zookeeper-3.4.9/bin

[[email protected] bin]$ ls

README.txt    zkCli.cmd  zkEnv.cmd  zkServer.cmd  zookeeper.out

zkCleanup.sh  zkCli.sh   zkEnv.sh   zkServer.sh

執行指令碼進入互動式介面

[[email protected] bin]$ ./zkCli.sh

WATCHER::

WatchedEvent state:SyncConnected type:None path:null 

[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 1] ls /zookeeper
[quota]
[zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
[]
[zk: localhost:2181(CONNECTED) 3] get /zookeeper/quota

cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0
[zk: localhost:2181(CONNECTED) 4] quit
Quitting...
2018-03-10 16:18:19,399 [myid:] - INFO  [main:[email protected]] - Session: 0x1620ef562a30002 closed
2018-03-10 16:18:19,400 [myid:] - INFO  [main-EventThread:[email protected]] - EventThread shut down for session: 0x1620ef562a30002

7】部署高可用

[[email protected] ~]$ cd hadoop/etc/hadoop/

[[email protected] hadoop]$ pwd

/home/hadoop/hadoop/etc/hadoop

[[email protected] hadoop]$ vim core-site.xml
<configuration>
<!-- 指定 hdfs 的 namenode 為 masters (名稱可自定義)-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<!-- 指定 zookeeper 叢集主機地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.37.2:2181,172.25.37.3:2181,172.25.37.4:2181</value>
</property>
</configuration>
配置檔案hdfs-site.xml:
[[email protected] hadoop]$ vim hdfs-site.xml
<configuration>
<!-- 指定 hdfs 的 nameservices 為 masters,和 core-site.xml 檔案中的設定保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- masters 下面有兩個 namenode 節點,分別是 h1 和 h2 (名稱可自定義)
-->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>
<!-- 指定 h1 節點的 rpc 通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.37.1:9000</value>
</property>
<!-- 指定 h1 節點的 http 通訊地址 -->
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.37.1:50070</value>
</property>
<!-- 指定 h2 節點的 rpc 通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.37.5:9000</value>
</property>
<!-- 指定 h2 節點的 http 通訊地址 -->
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.37.5:50070</value>
</property>
<!-- 指定 NameNode 元資料在 JournalNode 上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.37.2:8485;172.25.37.3:8485;172.25.37.4:8485/masters</value>
</property>
<!-- 指定 JournalNode 在本地磁碟存放資料的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value></property>
<!-- 開啟 NameNode 失敗自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失敗自動切換實現方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvid
er</value>
</property>
<!-- 配置隔離機制方法,每個機制佔用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用 sshfence 隔離機制時需要 ssh 免密碼 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置 sshfence 隔離機制超時時間 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>

8】在三個 DN 上依次啟動 journalnode(第一次啟動 hdfs 必須先啟動 journalnode)

[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.out
[[email protected] hadoop]$ jps
1658 QuorumPeerMain
1877 Jps
1827 JournalNode

[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server3.out
[[email protected] hadoop]$ jps
1790 JournalNode
1840 Jps
1704 QuorumPeerMain
[[email protected] hadoop]$

[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server4.out
[[email protected] hadoop]$ jps
1245 QuorumPeerMain
1344 JournalNode
1394 Jps
[[email protected] hadoop]$

9】格式化 HDFS 叢集

[[email protected] hadoop]$ bin/hdfs namenode -format

10】將/tmp/hadoop-hadoop目錄傳送到server5

[[email protected] hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.37.5:/tmp
seen_txid                                                                 100%    2     0.0KB/s   00:00    
VERSION                                                                   100%  202     0.2KB/s   00:00    
fsimage_0000000000000000000.md5                                           100%   62     0.1KB/s   00:00    
fsimage_0000000000000000000                                               100%  352     0.3KB/s   00:00  

11】格式化 zookeeper

[[email protected] hadoop]$ bin/hdfs zkfc -formatZK

12】建立目錄用於測試

[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ ./bin/hdfs dfs -put etc/hadoop/ /user/hadoop/input
[[email protected] hadoop]$ ./bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-06-02 14:13 input

13】啟動 hdfs 叢集

[[email protected] hadoop]$ sbin/start-dfs.sh    
Starting namenodes on [server1 server5]
server1: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server1.out
server5: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server5.out
172.25.37.3: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server3.out
172.25.37.2: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server2.out
172.25.37.4: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server4.out
Starting journal nodes [172.25.37.2 172.25.37.3 172.25.37.4]
172.25.37.4: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server4.out
172.25.37.3: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server3.out
172.25.37.2: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.out
Starting ZK Failover Controllers on NN hosts [server1 server5]
server1: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server1.out
server5: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server5.out
檢視程序
[[email protected] hadoop]$ jps
6209 Jps
5847 NameNode
6141 DFSZKFailoverController
[[email protected] ~]$ jps
1548 Jps
1416 DFSZKFailoverController
1319 NameNode
[[email protected] hadoop]$ jps
1661 JournalNode
1726 Jps
1224 QuorumPeerMain
1568 DataNode
[[email protected] hadoop]$ jps
1776 Jps
1616 DataNode
1709 JournalNode
1213 QuorumPeerMain
[[email protected] hadoop]$ jps
1204 QuorumPeerMain
1562 DataNode
1655 JournalNode
1723 Jps

14】瀏覽器測試

你可以看到server1狀態為active,server5狀態處於standby。

 



5】高可用測試測試:

關閉狀態處於active的namenode,我的是server1:

[[email protected] hadoop]$ jps
2611 DFSZKFailoverController
2314 NameNode
3671 Jps
[[email protected] hadoop]$ kill -9 2314
在瀏覽器中可以看到server5狀態變為active,server1狀態變為standby。


[[email protected] hadoop]$ ./sbin/hadoop-daemon.sh start namenode

五、配置yarn高可用

同樣是兩個配置檔案:

1】編輯 mapred-site.xml 檔案

<configuration>
<!-- 指定 yarn 為 MapReduce 的框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

2】編輯 yarn-site.xml 檔案

<configuration>
<!-- 配置可以在 nodemanager 上執行 mapreduce 程式 -->
<property>
<name>yarn.nodemanager.aux-services<