Hadoop搭建詳細過程
一、hadoop環境安裝
【1】建立hadoop使用者並切換到hadoop使用者
[[email protected] ~]# useradd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]# su - hadoop
【2】下載hadoop和jdk並解壓
注:兩個軟體包都放到hadoop家目錄底下。
[[email protected] ~]$ tar zxf hadoop-2.7.3.tar.gz
注:為了方便這裡做軟連線,方便配置。
[[email protected] ~]$ ln -s jdk1.7.0_79/ jdk
[[email protected] ~]$ ln -s hadoop-2.7.3 hadoop
【3】配置hadoop環境變數
[[email protected] hadoop]$ pwd /home/hadoop/hadoop/etc/hadoop [[email protected] hadoop]$ vim hadoop-env.sh 24 # The java implementation to use. 25 export JAVA_HOME=/home/hadoop/jdk [
[email protected] ~]$ cat /etc/hosts 172.25.37.1 server1 注:必須有上面那一項,要不然執行會報錯,IP為你hadoop伺服器IP [[email protected] ~]$ cat .bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin:/home/hadoop/jdk/bin export PATH
【4】第一次啟動hadoop
[[email protected] hadoop]$ cd /home/hadoop/hadoop/bin/
[[email protected] bin]$ ./hadoop
注:執行指令碼啟動,類似與初始化,看是否報錯。
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ mkdir input
[[email protected] hadoop]$ cp etc/hadoop/*.xml input/
執行hadoop自帶的mapreduce Demo
注:MapReduce是一種程式設計模型,用於大規模資料集(大於1TB)的並行運算。
[[email protected]]$ bin/hadoop jar \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar \ grep input output 'dfs[a-z.]+'
檢視輸出檔案
[[email protected] hadoop]$ cat output/*
1 dfsadmin
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
[[email protected] hadoop]$ ls
bin etc include input lib libexec LICENSE.txt NOTICE.txt output README.txt sbin share
二、偽分散式構建
【1】配置core-site.xml檔案
[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim core-site.xml
17 <!-- Put site-specific property overrides in this file. -->
18
19 <configuration>
20 <property>
21 <name>fs.defaultFS</name>
22 <value>hdfs://172.25.37.1:9000</value>
23 </property>
24 </configuration>
注:fs.defaultFS引數配置的是HDFS的地址
【2】配置 hdfs-site.xml 檔案
[[email protected] hadoop]$ vim hdfs-site.xml
17 <!-- Put site-specific property overrides in this file. -->
18
19 <configuration>
20 <property>
21 <name>dfs.replication</name>
22 <value>1</value>
23 </property>
24 </configuration>
注:dfs.replication配置的是HDFS儲存時的備份數量,因為這裡是偽分散式環境只有一個節點,所以這裡設定為1。
【3】配置ssh免密
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
將下面檔案中的localhost改為本機IP
[[email protected] hadoop]$ vim slaves
172.25.37.1
【4】格式化檔案系統
[[email protected] hadoop]$ bin/hdfs namenode -format
注:格式化是對HDFS這個分散式檔案系統中的DataNode進行分塊,統計所有分塊後的初始元資料的儲存在NameNode中。
【5】啟動HDFS
[[email protected] hadoop]$ ./sbin/start-dfs.sh
Starting namenodes on [server1]
server1: namenode running as process 2496. Stop it first.
172.25.37.1: datanode running as process 2164. Stop it first.
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 9b:19:24:43:5d:09:3a:12:97:94:99:f4:61:dc:3d:e2.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-server1.out
【6】檢視程序,四個程序都啟動表示成功
[[email protected] hadoop]$ jps
3151 Jps
2164 DataNode
3042 SecondaryNameNode
2496 NameNode
檢視埠:
[[email protected] hadoop]$ netstat -antlp | grep 50070
tcp 0 0 0.0.0.0:50070
【7】瀏覽器測試
輸入: 172.25.37.1:50070
檢視datanode:
通過命令方式檢視:
[[email protected] hadoop]$ ./bin/hdfs dfsadmin -report
Configured Capacity: 14309232640 (13.33 GB)
Present Capacity: 11614142464 (10.82 GB)
DFS Remaining: 11614113792 (10.82 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 172.25.37.1:50010 (server1)
Hostname: server1
Decommission Status : Normal
Configured Capacity: 14309232640 (13.33 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 2695090176 (2.51 GB)
DFS Remaining: 11614113792 (10.82 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 31 22:22:11 CST 2018
建立目錄用於上傳檔案:
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ bin/hdfs dfs -put ./input /user/hadoop/
[[email protected] hadoop]$ ./bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2018-05-31 22:24 input
[[email protected] hadoop]$ ./bin/hdfs dfs -ls input
Found 29 items
-rw-r--r-- 1 hadoop supergroup 4436 2018-05-31 22:24 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1335 2018-05-31 22:24 input/configuration.xsl
篇幅原因這裡只列出兩項
刪除input和output目錄重新執行mapreduce
[[email protected] hadoop]$ rm -fr input/ output
[[email protected] hadoop]$ ./bin/hadoop jar \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output
[[email protected] hadoop]$ ./bin/hdfs dfs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2018-05-31 22:24 input
drwxr-xr-x - hadoop supergroup 0 2018-05-31 22:30 output
瀏覽器訪問 http://172.25.20.1:50070/explorer.html,可以看到剛才建立的目錄。
【8】讀取HDFS上的檔案內容
在hadoop目錄執行
./bin/hdfs dfs -cat 你要檢視的檔案絕對路徑。
【9】從HDFS上下載檔案到本地
在hadoop目錄執行
./bin/hdfs dfs -get 你要下載的檔案目錄。
待補充
三、完全分散式構建
環境:
物理機:rhel7.3 172.25.37.250/24 用於時間同步
Server1:rhel6.5 172.25.37.1/24
Server2:rhel6.5 172.25.37.2/24
Server3:rhel6.5 172.25.37.3/24
【1】時間同步
真機將同步源設為百度IP
[[email protected] etc]# vim /etc/chrony.conf
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
注:這裡寫你的同步源
server 119.75.213.61 iburst
# Allow NTP client access from local network.
注:這裡寫你允許172.25/16網段同步時間
allow 172.25/16
在三個虛擬都作下面配置:
[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
注:將同步源設定為物理機
server 172.25.37.250 iburst
啟動ntpd
[[email protected] ~]# /etc/init.d/ntpd start
設定開機自啟動
[[email protected] ~]# chkconfig ntpd on
【2】配置ssh免密
[[email protected] ~]# su - hadoop
[[email protected] ~]$ cd .ssh/
[[email protected] .ssh]$ ls
authorized_keys id_rsa id_rsa.pub known_hosts
[[email protected] .ssh]$ ssh-copy-id 172.25.37.2
[[email protected] .ssh]$ ssh-copy-id 172.25.37.3
測試時,在server1上面執行下面的指令:
[[email protected] ~]$ ssh 172.25.37.2
[[email protected] ~]$ ssh 172.25.37.3
注:兩次測試輸入yes後回車,不需要再輸入密碼錶示成功。
【3】在三臺虛擬機器上面均建立hadoop使用者並且要求id 完全一致。
[[email protected] ~]$ id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]# useradd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]# useradd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
【4】 nfs配置檔案共享
(1)安裝軟體
[[email protected] ~]# yum install -y exportfs
(2)配置共享目錄
[[email protected] ~]# vim /etc/exports
/home/hadoop *(rw,anonuid=500,anongid=500)
(3)檢視共享目錄資訊
[[email protected] ~]# exportfs -rv
exporting *:/home/hadoop
(4)開啟服務
[[email protected] ~]# /etc/init.d/rpcbind start ##先開 這個
Starting rpcbind: [ OK ]
[[email protected] ~]# /etc/init.d/nfs start
Starting NFS services: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]
(5)在server2和server3上面安裝軟體,並掛載共享目錄
[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind start
Starting rpcbind: [ OK ]
[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind start
Starting rpcbind: [ OK ]
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/ /home/hadoop/
[[email protected] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root 13973860 922768 12341256 7% /
tmpfs 510200 0 510200 0% /dev/shm
/dev/vda1 495844 33457 436787 8% /boot
172.25.20.1:/home/hadoop/ 13974016 1932544 11331584 15% /home/hadoop
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/ /home/hadoop/
[[email protected] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root 13973860 922768 12341256 7% /
tmpfs 510200 0 510200 0% /dev/shm
/dev/vda1 495844 33457 436787 8% /boot
172.25.20.1:/home/hadoop/ 13974016 1932544 11331584 15% /home/hadoop
【4】在三臺虛擬機器上面的/etc/hosts解析要相同,我的檔案內容如下:
[[email protected] hadoop]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.37.1 server1
172.25.37.2 server2
172.25.37.3 server3
清空之前測試時的檔案
[[email protected] ~]# rm -fr /tmp/*
[[email protected] ~]# rm -fr /tmp/*
[[email protected] ~]# rm -fr /tmp/*
[[email protected] ~]# su - hadoop
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ ls
bin etc include input lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
注:如果之前測試時沒有關閉hdfs服務,那麼現在關閉。
[[email protected] hadoop]$ sbin/stop-dfs.sh
【5】配置分散式,同樣需要修改兩個檔案
[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.25.37.1:9000</value>
</property>
</configuration>
這裡將叢集數量改為2
[[email protected] hadoop]$ vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
將slaves檔案做如下配置,現在的datanode為server2和server3
[[email protected] hadoop]$ vim slaves
172.25.37.2
172.25.37.3
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
【6】初始化並開啟hdfs
[[email protected] hadoop]$ bin/hdfs namenode -format
[[email protected] hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
server1: namenode running as process 3463. Stop it first.
172.25.37.3: datanode running as process 1202. Stop it first.
172.25.37.2: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 3710. Stop it first.
【7】瀏覽器測試
你可以看到兩個datanode
檢視程序
[[email protected] ~]$ jps
3463 NameNode
3710 SecondaryNameNode
4515 Jps
[[email protected] hadoop]# su - hadoop
[[email protected] ~]$ jps
1255 DataNode
1415 Jps
[[email protected] ~]# su - hadoop
[[email protected] ~]$ jps
1202 DataNode
1464 Jps
【8】擴容與縮容
通過更改配置檔案hdfs-site.xml中value鍵值對中的數字以及values檔案中的IP,
[[email protected] hadoop]$ ./bin/hdfs dfsadmin -refreshNodes
【9】配置yarn
[[email protected] hadoop]$ cp mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[[email protected] hadoop]$ vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[[email protected] hadoop]$ vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
啟動yarn:
[[email protected] hadoop]$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-server1.out
172.25.37.2: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-server2.out
172.25.37.3: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-server3.out
檢視程序:
[[email protected] hadoop]$ jps
3129 SecondaryNameNode
3375 ResourceManager
2939 NameNode
3632 Jps
[[email protected] hadoop]$ jps
1687 DataNode
1832 NodeManager
1930 Jps
[[email protected] hadoop]$ jps
1806 NodeManager
1904 Jps
1648 DataNode
注:如果server2和server3上面沒有顯示NodeManager,你可以在對應虛擬機器上面執行:
[[email protected] hadoop]$ ./sbin/yarn-daemon.sh start nodemanager
建立hadoop存檔:
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ mkdir input
[[email protected] hadoop]$ cp etc/hadoop/*.xml input/
[[email protected] hadoop]$ ./bin/hdfs dfs -put input
[[email protected] hadoop]$ ./bin/hadoop archive -archiveName test.har -p /user/hadoop/ input/* input/
18/06/01 21:23:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/01 21:23:09 INFO mapreduce.JobSubmitter: number of splits:1
在瀏覽器上訪問172.25.37.1:8088/cluster
四、用zookeeper實現hadoop高可用
環境:
rhel6.5 iptables stop && selinux disabled
Server1 :172.25.37.1/24
Server2 :172.25.37.2/24
Server3 :172.25.37.3/24
Server4 :172.25.37.4/24
Server5 :172.25.37.5/24
server1 && server5 --> HA
server 2、3、4 儲存節點
【1】在虛擬機器server4、server5上面配置免密登陸並新增hadoop使用者
[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)s
[[email protected] ~]# useradd hadoop
[[email protected] ~]# passwd hadoop
[[email protected] ~]# id hadoop
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop)
[[email protected] ~]$ cd .ssh/
[[email protected] .ssh]$ ls
authorized_keys id_rsa id_rsa.pub known_hosts
[[email protected] .ssh]$ ssh-copy-id 172.25.37.4
[[email protected] .ssh]$ ssh-copy-id 172.25.37.5
【2】在server4、server5上面做時間同步
[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf
server 172.25.37.250 iburst
[[email protected] ~]# /etc/init.d/ntpd start
Starting ntpd: [ OK ]
[[email protected] ~]# yum install -y ntp
[[email protected] ~]# vim /etc/ntp.conf
server 172.25.37.250 iburst
[[email protected] ~]# /etc/init.d/ntpd start
Starting ntpd: [ OK ]
【3】在server4、server5上面通過nfs共享hadoop配置
[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind status
rpcbind (pid 1092) is running...
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/ /home/hadoop/
[[email protected] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root 13973860 922772 12341252 7% /
tmpfs 510200 0 510200 0% /dev/shm
/dev/vda1 495844 33457 436787 8% /boot
172.25.37.1:/home/hadoop/ 13974016 1954560 11309568 15% /home/hadoop
[[email protected] ~]# yum install -y exportfs
[[email protected] ~]# /etc/init.d/rpcbind status
rpcbind (pid 1092) is running...
[[email protected] ~]# mount 172.25.37.1:/home/hadoop/ /home/hadoop/
[[email protected] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root 13973860 922760 12341264 7% /
tmpfs 510200 0 510200 0% /dev/shm
/dev/vda1 495844 33457 436787 8% /boot
172.25.20.1:/home/hadoop/ 13974016 1954560 11309568 15% /home/hadoop
【4】在所有虛擬機器上面做如下解析:
[[email protected] hadoop]$ cat /etc/hosts
172.25.37.1 server1
172.25.37.2 server2
172.25.37.3 server3
172.25.37.4 server4
172.25.37.5 server5
【5】清除之前實驗產生的資料
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/stop-yarn.sh
stopping yarn daemons
no resourcemanager to stop
172.25.37.3: no nodemanager to stop
172.25.37.2: no nodemanager to stop
no proxyserver to stop
[[email protected] hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [server1]
server1: stopping namenode
172.25.37.3: stopping datanode
172.25.37.2: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
[[email protected] hadoop]$
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
[[email protected] ~]$ rm -fr /tmp/*
【5】將datanode數量改為3
[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
[[email protected] hadoop]$ vim slaves
172.25.37.2
172.25.37.3
172.25.37.4
【6】配置zookeeper 叢集
(1)解壓檔案
[[email protected] ~]$ tar zxf zookeeper-3.4.9.tar.gz
[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd conf/
[[email protected] conf]$ ls
configuration.xsl log4j.properties zoo_sample.cfg
(2)複製配置檔案:
[[email protected] conf]$ cp zoo_sample.cfg zoo.cfg
(3)主要做如下配置:
[[email protected] conf]$ vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
server.1=172.25.37.2:2888:3888
server.2=172.25.37.3:2888:3888
server.3=172.25.37.4:2888:3888
各節點配置檔案相同,並且需要在/tmp/zookeeper 目錄中建立 myid 檔案,寫入
一個唯一的數字,取值範圍在 1-255。比如:172.25.37.2 節點的 myid 檔案寫入數
字“1”,此數字與配置檔案中的定義保持一致,(server.1=172.25.37.2:2888:3888
)其它節點依次類推。
配置引數詳解:
clientPort
客戶端連線 server 的埠,即對外服務埠,一般設定為 2181 吧。
dataDir
儲存快照檔案 snapshot 的目錄。預設情況下,事務日誌也會儲存在這裡。建議同時配置參
數 dataLogDir, 事務日誌的寫效能直接影響 zk 效能。
tickTime
ZK 中的一個時間單元。ZK 中所有時間都是以這個時間單元為基礎,以毫秒計,用來調節
心跳和超時。例如,session 的最小超時時間是 2*tickTime。
dataLogDir
事務日誌輸出目錄。儘量給事務日誌的輸出配置單獨的磁碟或是掛載點,這將極大的提升
ZK 效能。
[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 1 > /tmp/zookeeper/myid
[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 2 > /tmp/zookeeper/myid
[[email protected] ~]$ mkdir /tmp/zookeeper
[[email protected] ~]$ echo 3 > /tmp/zookeeper/myid
(4)開啟服務
[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$
[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./
README.txt zkCli.cmd zkEnv.cmd zkServer.cmd
zkCleanup.sh zkCli.sh zkEnv.sh zkServer.sh
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$
[[email protected] ~]$ cd zookeeper-3.4.9
[[email protected] zookeeper-3.4.9]$ cd bin/
[[email protected] bin]$ ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] bin]$
(4)檢視各節點看狀態
[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower
[[email protected] bin]$
[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader
[[email protected] bin]$
[[email protected] bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower
[[email protected] bin]$
你可以看到server3被選舉為leader
(5)檢視程序
[[email protected] bin]$ jps
1658 QuorumPeerMain
1803 Jps
[[email protected] bin]$ jps
1766 Jps
1704 QuorumPeerMain
[[email protected] bin]$ jps
1245 QuorumPeerMain
1319 Jps
(6)zookeeper的互動式介面
[[email protected] bin]$ pwd
/home/hadoop/zookeeper-3.4.9/bin
[[email protected] bin]$ ls
README.txt zkCli.cmd zkEnv.cmd zkServer.cmd zookeeper.out
zkCleanup.sh zkCli.sh zkEnv.sh zkServer.sh
執行指令碼進入互動式介面
[[email protected] bin]$ ./zkCli.sh
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 1] ls /zookeeper
[quota]
[zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
[]
[zk: localhost:2181(CONNECTED) 3] get /zookeeper/quota
cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x0
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0
[zk: localhost:2181(CONNECTED) 4] quit
Quitting...
2018-03-10 16:18:19,399 [myid:] - INFO [main:[email protected]] - Session: 0x1620ef562a30002 closed
2018-03-10 16:18:19,400 [myid:] - INFO [main-EventThread:[email protected]] - EventThread shut down for session: 0x1620ef562a30002
【7】部署高可用
[[email protected] ~]$ cd hadoop/etc/hadoop/
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[[email protected] hadoop]$ vim core-site.xml
<configuration>
<!-- 指定 hdfs 的 namenode 為 masters (名稱可自定義)-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<!-- 指定 zookeeper 叢集主機地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>172.25.37.2:2181,172.25.37.3:2181,172.25.37.4:2181</value>
</property>
</configuration>
配置檔案hdfs-site.xml:
[[email protected] hadoop]$ vim hdfs-site.xml
<configuration>
<!-- 指定 hdfs 的 nameservices 為 masters,和 core-site.xml 檔案中的設定保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- masters 下面有兩個 namenode 節點,分別是 h1 和 h2 (名稱可自定義)
-->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>
<!-- 指定 h1 節點的 rpc 通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>172.25.37.1:9000</value>
</property>
<!-- 指定 h1 節點的 http 通訊地址 -->
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>172.25.37.1:50070</value>
</property>
<!-- 指定 h2 節點的 rpc 通訊地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>172.25.37.5:9000</value>
</property>
<!-- 指定 h2 節點的 http 通訊地址 -->
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>172.25.37.5:50070</value>
</property>
<!-- 指定 NameNode 元資料在 JournalNode 上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.25.37.2:8485;172.25.37.3:8485;172.25.37.4:8485/masters</value>
</property>
<!-- 指定 JournalNode 在本地磁碟存放資料的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/journaldata</value></property>
<!-- 開啟 NameNode 失敗自動切換 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失敗自動切換實現方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvid
er</value>
</property>
<!-- 配置隔離機制方法,每個機制佔用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用 sshfence 隔離機制時需要 ssh 免密碼 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置 sshfence 隔離機制超時時間 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
【8】在三個 DN 上依次啟動 journalnode(第一次啟動 hdfs 必須先啟動 journalnode)
[[email protected] hadoop]$ pwd
/home/hadoop/hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.out
[[email protected] hadoop]$ jps
1658 QuorumPeerMain
1877 Jps
1827 JournalNode
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server3.out
[[email protected] hadoop]$ jps
1790 JournalNode
1840 Jps
1704 QuorumPeerMain
[[email protected] hadoop]$
[[email protected] ~]$ cd hadoop
[[email protected] hadoop]$ sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server4.out
[[email protected] hadoop]$ jps
1245 QuorumPeerMain
1344 JournalNode
1394 Jps
[[email protected] hadoop]$
【9】格式化 HDFS 叢集
[[email protected] hadoop]$ bin/hdfs namenode -format
【10】將/tmp/hadoop-hadoop目錄傳送到server5
[[email protected] hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.37.5:/tmp
seen_txid 100% 2 0.0KB/s 00:00
VERSION 100% 202 0.2KB/s 00:00
fsimage_0000000000000000000.md5 100% 62 0.1KB/s 00:00
fsimage_0000000000000000000 100% 352 0.3KB/s 00:00
【11】格式化 zookeeper
[[email protected] hadoop]$ bin/hdfs zkfc -formatZK
【12】建立目錄用於測試
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user
[[email protected] hadoop]$ ./bin/hdfs dfs -mkdir /user/hadoop
[[email protected] hadoop]$ ./bin/hdfs dfs -put etc/hadoop/ /user/hadoop/input
[[email protected] hadoop]$ ./bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2018-06-02 14:13 input
【13】啟動 hdfs 叢集
[[email protected] hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1 server5]
server1: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server1.out
server5: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server5.out
172.25.37.3: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server3.out
172.25.37.2: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server2.out
172.25.37.4: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server4.out
Starting journal nodes [172.25.37.2 172.25.37.3 172.25.37.4]
172.25.37.4: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server4.out
172.25.37.3: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server3.out
172.25.37.2: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server2.out
Starting ZK Failover Controllers on NN hosts [server1 server5]
server1: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server1.out
server5: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server5.out
檢視程序
[[email protected] hadoop]$ jps
6209 Jps
5847 NameNode
6141 DFSZKFailoverController
[[email protected] ~]$ jps
1548 Jps
1416 DFSZKFailoverController
1319 NameNode
[[email protected] hadoop]$ jps
1661 JournalNode
1726 Jps
1224 QuorumPeerMain
1568 DataNode
[[email protected] hadoop]$ jps
1776 Jps
1616 DataNode
1709 JournalNode
1213 QuorumPeerMain
[[email protected] hadoop]$ jps
1204 QuorumPeerMain
1562 DataNode
1655 JournalNode
1723 Jps
【14】瀏覽器測試
你可以看到server1狀態為active,server5狀態處於standby。
【5】高可用測試測試:
關閉狀態處於active的namenode,我的是server1:
[[email protected] hadoop]$ jps
2611 DFSZKFailoverController
2314 NameNode
3671 Jps
[[email protected] hadoop]$ kill -9 2314
在瀏覽器中可以看到server5狀態變為active,server1狀態變為standby。
[[email protected] hadoop]$ ./sbin/hadoop-daemon.sh start namenode
五、配置yarn高可用
同樣是兩個配置檔案:
【1】編輯 mapred-site.xml 檔案
<configuration>
<!-- 指定 yarn 為 MapReduce 的框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
【2】編輯 yarn-site.xml 檔案
<configuration>
<!-- 配置可以在 nodemanager 上執行 mapreduce 程式 -->
<property>
<name>yarn.nodemanager.aux-services<