1. 程式人生 > 實用技巧 >Hadoop的HA高可用實驗

Hadoop的HA高可用實驗

Hadoop的HA高可用實驗

1、免祕鑰登入

叢集之間做免祕鑰登入

2、簡介

目的

本指南概述了HDFS高可用性(HA)功能以及如何使用Quorum Journal Manager(QJM)功能配置和管理HA HDFS群集。

本文件假定讀者對HDFS群集中的常規元件和節點型別有一般的瞭解。有關詳細資訊,請參閱HDFS體系結構指南。

注意:使用Quorum Journal Manager或常規共享儲存

本指南討論如何使用Quorum Journal Manager(QJM)配置和使用HDFS HA,以在活動和備用NameNode之間共享編輯日誌。有關如何使用NFS而非QJM將NFS用於共享儲存來配置HDFS HA的資訊,請參閱

此替代指南。

背景

在Hadoop 2.0.0之前,NameNode是HDFS叢集中的單點故障(SPOF)。每個群集只有一個NameNode,如果該計算機或程序不可用,則整個群集將不可用,直到NameNode重新啟動或在單獨的計算機上啟動。

這從兩個方面影響了HDFS群集的總可用性:

  • 如果發生意外事件(例如機器崩潰),則在操作員重新啟動NameNode之前,群集將不可用。
  • 計劃內的維護事件,例如NameNode計算機上的軟體或硬體升級,將導致群集停機時間的延長。

HDFS高可用性功能通過提供在帶有熱備用的主動/被動配置中在同一群集中執行兩個冗餘NameNode的選項來解決上述問題。這可以在計算機崩潰的情況下快速故障轉移到新的NameNode,或出於計劃維護的目的由管理員發起的正常故障轉移。

建築

在典型的HA群集中,將兩個單獨的計算機配置為NameNode。在任何時間點,一個NameNode都恰好處於Active狀態,而另一個Node 處於Standby狀態。Active NameNode負責叢集中的所有客戶端操作,而Standby只是充當從屬,並保持足夠的狀態以在必要時提供快速故障轉移。

為了使備用節點保持其狀態與活動節點同步,兩個節點都與一組稱為“ JournalNodes”(JN)的單獨守護程式進行通訊。當活動節點執行任何名稱空間修改時,它會持久地將修改記錄記錄到大多數這些JN中。Standby節點能夠從JN讀取編輯,並一直在監視它們以檢視編輯日誌的更改。當“備用節點”看到編輯內容時,會將其應用於自己的名稱空間。發生故障轉移時,備用資料庫將確保在將自身升級為活動狀態之前,已從JounalNodes讀取所有編輯內容。這樣可確保在發生故障轉移之前,名稱空間狀態已完全同步。

為了提供快速的故障轉移,備用節點還必須具有有關叢集中塊位置的最新資訊。為了實現這一點,DataNodes被配置了兩個NameNodes的位置,並向兩者傳送塊位置資訊和心跳訊號。

對於HA群集的正確操作至關重要,一次只能有一個NameNode處於活動狀態。否則,名稱空間狀態將在兩者之間迅速分散,從而有資料丟失或其他不正確結果的風險。為了確保此屬性並防止所謂的“裂腦情況”,JournalNode將僅一次允許單個NameNode成為作者。在故障轉移期間,將變為活動狀態的NameNode將僅承擔寫入JournalNodes的角色,這將有效地防止另一個NameNode繼續處於活動狀態,從而使新的Active可以安全地進行故障轉移。

硬體資源

為了部署高可用性群集,您應該準備以下內容:

  • NameNode計算機 -執行活動NameNode和Standby NameNode的計算機應具有彼此等效的硬體,以及與非HA群集中將使用的硬體相同的硬體。
  • JournalNode計算機 -執行JournalNode的計算機。JournalNode守護程式相對較輕,因此可以合理地將這些守護程式與其他Hadoop守護程式(例如NameNodes,JobTracker或YARN ResourceManager)並置在計算機上。注意:必須至少有3個JournalNode守護程式,因為必須將編輯日誌修改寫入大多數JN。這將允許系統容忍單個計算機的故障。您可能還會執行3個以上的JournalNode,但是為了實際增加系統可以容忍的故障數量,您應該執行奇數個JN(即3、5、7等)。請注意,當與N個JournalNode一起執行時,系統最多可以容忍(N-1)/ 2個故障,並繼續正常執行。

請注意,在HA群集中,備用NameNode也執行名稱空間狀態的檢查點,因此不必在HA群集中執行Secondary NameNode,CheckpointNode或BackupNode。實際上,這樣做將是一個錯誤。這還允許重新配置未啟用HA的HDFS群集的使用者啟用HA,以重用他們先前專用於次要NameNode的硬體。

部署方式

與聯合身份驗證配置類似,高可用性配置向後相容,並允許現有的單個NameNode配置無需更改即可工作。設計新的配置,以便群集中的所有節點都可以具有相同的配置,而無需根據節點的型別將不同的配置檔案部署到不同的計算機。

像HDFS聯合身份驗證一樣,HA群集重用名稱服務ID來標識實際上可能由多個HA NameNode組成的單個HDFS例項。此外,HA還添加了一個名為NameNode ID的新抽象。群集中的每個不同的NameNode都有一個不同的NameNode ID來區分它。為了支援所有NameNode的單個配置檔案,相關的配置引數字尾有nameservice IDNameNode ID

配置細節

要配置HA NameNode,必須將多個配置選項新增到hdfs-site.xml配置檔案中。

設定這些配置的順序並不重要,但是為dfs.nameservicesdfs.ha.namenodes。[nameservice ID]選擇的值將確定後面的金鑰。因此,您應該在設定其餘配置選項之前決定這些值。

3、規劃


NN-1 NN-2 DN ZK ZKFC JNN
Node1 * * *
Node2 * * * * *
Node3 * * *
Node4 * *

ZK: zookeeper

ZKFC: failover controller【故障轉移程序】

hdfs-site.xml配置

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <!--配置一個叢集服務ID-->
    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
    <!--使用逗號分隔的NameNode ID列表進行配置。DataNode將使用它來確定叢集中的所有NameNode。例如,如果您以前使用“ mycluster”作為名稱服務ID,並且想要使用“ nn1”和“ nn2”作為NameNode的各個ID-->
    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>
    <!-- 對於兩個先前配置的NameNode ID,請設定NameNode程序的完整地址和IPC埠。請注意,這將導致兩個單獨的配置選項。 -->
    <property>
  		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
  		<value>MDNode01:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>MDNode02:8020</value>
    </property>
    <!-- 為兩個NameNode的HTTP伺服器設定地址以進行偵聽 -->
    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>MDNode01:50070</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>MDNode02:50070</value>
    </property>
    <!-- 標識NameNode將在其中寫入/讀取編輯內容的JN組的UR 
	在這裡,可以配置提供共享編輯儲存的JournalNode的地址,該地址由Active nameNode寫入並由Standby NameNode讀取,以與Active NameNode所做的所有檔案系統更改保持最新。儘管您必須指定幾個JournalNode地址,但是您僅應配置這些URI之一。URI的格式應為:“ qjournal:// host1:port1 ; host2:port2 ; host3:port3 / journalId ”。日記ID是此名稱服務的唯一識別符號,它允許單個JournalNode集為多個聯合名稱系統提供儲存。儘管不是必需的,但最好將名稱服務ID用作日記識別符號。
	-->
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://MDNode01:8485;MDNode02:8485;MDNode03:8485/mycluster</value>
    </property>
    <!-- 配置Java類的名稱,DFS客戶端將使用該Java類來確定哪個NameNode是當前的Active,從而確定哪個NameNode當前正在服務於客戶端請求。Hadoop當前隨附的唯一實現是ConfiguredFailoverProxyProvider,因此請使用此實現,除非您使用的是自定義實現。 -->
    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        <!--
	為了保證系統的正確性,在任何給定時間只有一個NameNode處於Active狀態。重要的是,使用Quorum Journal Manager時,將只允許一個NameNode寫入JournalNodes,因此不會因裂腦情況而損壞檔案系統元資料。但是,當發生故障轉移時,以前的Active NameNode仍然有可能向客戶端提供讀取請求,這可能已過期,直到該NameNode在嘗試寫入JournalNodes時關閉為止。因此,即使使用Quorum Journal Manager,仍然需要配置一些防護方法。但是,為了在防護機制失敗的情況下提高系統的可用性,建議配置一種防護方法,以確保成功返回列表中的最後一種防護方法
-->
    </property>
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
<!--遠端登入所需的ssh祕鑰-->
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/root/.ssh/id_dsa</value>
    </property>
	<!--這是JournalNode機器上將儲存JN使用的編輯和其他本地狀態的絕對路徑。您只能為此配置使用單個路徑。通過執行多個單獨的JournalNode或在本地連線的RAID陣列上配置此目錄,可以提供此資料的冗餘-->
    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/var/mgs/hadoop/ha/journode</value>
    </property>

</configuration>

core-site.xml

<!--
您現在可以配置Hadoop客戶端的預設路徑,以使用新的啟用HA的邏輯URI。如果您之前使用“ mycluster”作為名稱服務ID,則它將是所有HDFS路徑的授權部分的值。
-->
<configuration>
  <property>
      <name>fs.defaultFS</name>
      <value>hdfs://mycluster</value>
  </property>
  <!--數保持的檔案目錄-->
  <property>
      <name>hadoop.tmp.dir</name>
      <value>/var/mgs/hadoop/ha</value>
  </property>
</configuration>

Zookeeper配置

自動故障轉移

介紹

以上各節描述瞭如何配置手動故障轉移。在這種模式下,即使活動節點發生故障,系統也不會自動觸發從活動NameNode到備用NameNode的故障轉移。本節介紹如何配置和部署自動故障轉移。

元件

自動故障轉移為HDFS部署添加了兩個新元件:ZooKeeper叢集和ZKFailoverController程序(縮寫為ZKFC)。

Apache ZooKeeper是一項高可用性服務,用於維護少量的協調資料,將資料中的更改通知客戶端並監視客戶端的故障。HDFS自動故障轉移的實現依賴ZooKeeper進行以下操作:

  • 故障檢測 -群集中的每個NameNode計算機都在ZooKeeper中維護一個永續性會話。如果計算機崩潰,則ZooKeeper會話將終止,通知另一個NameNode應觸發故障轉移。
  • 活動的NameNode選舉 -ZooKeeper提供了一種簡單的機制來專門選舉一個節點為活動的節點。如果當前活動的NameNode崩潰,則另一個節點可能會在ZooKeeper中採取特殊的排他鎖,指示它應成為下一個活動的NameNode。

ZKFailoverController(ZKFC)是一個新元件,它是一個ZooKeeper客戶端,它還監視和管理NameNode的狀態。執行NameNode的每臺機器還執行ZKFC,該ZKFC負責:

  • 執行狀況監視 -ZKFC使用執行狀況檢查命令定期ping其本地NameNode。只要NameNode以健康狀態及時響應,ZKFC就會認為該節點是健康的。如果節點崩潰,凍結或以其他方式進入不正常狀態,則執行狀況監視器將其標記為不正常。
  • ZooKeeper會話管理 -當本地NameNode執行狀況良好時,ZKFC會在ZooKeeper中保持開啟的會話。如果本地NameNode處於活動狀態,則它還將持有一個特殊的“鎖定” znode。該鎖使用ZooKeeper對“臨時”節點的支援。如果會話到期,則鎖定節點將被自動刪除。
  • 基於ZooKeeper的選舉 -如果本地NameNode執行狀況良好,並且ZKFC看到當前沒有其他節點持有鎖znode,則它本身將嘗試獲取該鎖。如果成功,則它“贏得選舉”,並負責執行故障轉移以使其本地NameNode處於活動狀態。故障轉移過程類似於上述的手動故障轉移:首先,如有必要,將先前的活動節點隔離,然後將本地NameNode轉換為活動狀態。

有關自動故障轉移設計的更多詳細資訊,請參閱Apache HDFS JIRA上HDFS-2185附帶的設計文件。

部署ZooKeeper

在典型的部署中,ZooKeeper守護程式被配置為在三個或五個節點上執行。由於ZooKeeper本身對光資源有要求,因此可以將ZooKeeper節點並置在與HDFS NameNode和Standby Node相同的硬體上。許多操作員選擇將第三個ZooKeeper程序與YARN ResourceManager部署在同一節點上。建議將ZooKeeper節點配置為將其資料與HDFS元資料儲存在單獨的磁碟驅動器上,以實現最佳效能和隔離。

ZooKeeper的設定超出了本文件的範圍。我們將假定您已經設定了在三個或更多節點上執行的ZooKeeper叢集,並已通過使用ZK CLI進行連線來驗證其正確的操作。

在你開始之前

在開始配置自動故障轉移之前,應關閉叢集。在群集執行時,當前無法從手動故障轉移設定過渡到自動故障轉移設定。

配置:

hdfs-site.xml新增

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>

core-site.xml 中

<property>
   <name>ha.zookeeper.quorum</name>
   <value>MDNode02:2181,MDNode03:2181,MDNode04:2181</value>
 </property>

分發這兩個配置檔案:

[root@MDNode01 hadoop]# scp core-site.xml hdfs-site.xml MDNode02:`pwd`
core-site.xml                                                                                                                                                                                                  100% 1111     1.1KB/s   00:00    
hdfs-site.xml                                                                                                                                                                                                  100% 2239     2.2KB/s   00:00    
[root@MDNode01 hadoop]# scp core-site.xml hdfs-site.xml MDNode03:`pwd`
core-site.xml                                                                                                                                                                                                  100% 1111     1.1KB/s   00:00    
hdfs-site.xml                                                                                                                                                                                                  100% 2239     2.2KB/s   00:00    
[root@MDNode01 hadoop]# scp core-site.xml hdfs-site.xml MDNode04:`pwd`
core-site.xml                                                                                                                                                                                                  100% 1111     1.1KB/s   00:00    
hdfs-site.xml                                                                                                                                                                                                  100% 2239     2.2KB/s   00:00    
[root@MDNode01 hadoop]# 

Zookeeper的配置:

下載解壓Zookeeper到/opt/mgs/目錄下面【我用的是3.4.14版本】

配置zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
#目錄必須存在
dataDir=/var/mgs/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

server.1=MDNode02:2888:3888
server.2=MDNode03:2888:3888
server.3=MDNode04:2888:3888

配置/etc/proflie檔案的ZOOKEEPER_HOME變數

export JAVA_HOME=/usr/java/jdk1.8.0_221
export JRE_HOME=/usr/java/jdk1.8.0_221/jre
export HADOOP_HOME=/opt/mgs/hadoop-2.7.5
export ZOOKEEPER_HOME=/opt/mgs/zookeeper-3.4.14
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin

常用的引數命令:

zkServer.sh

start|start-foreground|stop|restart|status|upgrade|print-cmd


After all of the necessary configuration options have been set, you must start the JournalNode daemons on the set of machines where they will run. This can be done by running the command "hadoop-daemon.sh start journalnode" and waiting for the daemon to start on each of the relevant machines.

Once the JournalNodes have been started, one must initially synchronize the two HA NameNodes' on-disk metadata.

  • If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode -format) on one of NameNodes.
  • If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command "hdfs namenode -bootstrapStandby" on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.
  • If you are converting a non-HA NameNode to be HA, you should run the command "hdfs -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories.

At this point you may start both of your HA NameNodes as you normally would start a NameNode.

You can visit each of the NameNodes' web pages separately by browsing to their configured HTTP addresses. You should notice that next to the configured address will be the HA state of the NameNode (either "standby" or "active".) Whenever an HA NameNode starts, it is initially in the Standby state.


啟動journalnode

在1、2、3節點啟動journalnode

[root@MDNode01 ~]# hadoop-daemon.sh start journalnode
starting journalnode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-journalnode-MDNode01.out
[root@MDNode01 ~]# jps
1382 JournalNode
1417 Jps

在1節點上面的一個節點

[root@MDNode01 ~]# hdfs namenode -format
19/11/19 22:28:17 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = MDNode01/192.168.25.50
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.5
STARTUP_MSG:   classpath = /opt/mgs/hadoop-2.7.5/etc/hadoop:/opt/mgs/hadoop-2.7.5/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/mgs/hadoop-
......
2.7.5.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar:/opt/mgs/hadoop-2.7.5/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://[email protected]/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075; compiled by 'kshvachk' on 2017-12-16T01:06Z
STARTUP_MSG:   java = 1.8.0_221
************************************************************/
19/11/19 22:28:17 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/11/19 22:28:17 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-574c1fd8-3458-4039-bc23-29255a0c7333
19/11/19 22:28:19 INFO namenode.FSNamesystem: No KeyProvider found.
19/11/19 22:28:19 INFO namenode.FSNamesystem: fsLock is fair: true
19/11/19 22:28:19 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
19/11/19 22:28:19 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
19/11/19 22:28:19 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
19/11/19 22:28:19 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
19/11/19 22:28:19 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Nov 19 22:28:19
19/11/19 22:28:19 INFO util.GSet: Computing capacity for map BlocksMap
19/11/19 22:28:19 INFO util.GSet: VM type       = 64-bit
19/11/19 22:28:19 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
19/11/19 22:28:19 INFO util.GSet: capacity      = 2^21 = 2097152 entries
19/11/19 22:28:20 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
19/11/19 22:28:20 INFO blockmanagement.BlockManager: defaultReplication         = 2
19/11/19 22:28:20 INFO blockmanagement.BlockManager: maxReplication             = 512
19/11/19 22:28:20 INFO blockmanagement.BlockManager: minReplication             = 1
19/11/19 22:28:20 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
19/11/19 22:28:20 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
19/11/19 22:28:20 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
19/11/19 22:28:20 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
19/11/19 22:28:20 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
19/11/19 22:28:20 INFO namenode.FSNamesystem: supergroup          = supergroup
19/11/19 22:28:20 INFO namenode.FSNamesystem: isPermissionEnabled = true
19/11/19 22:28:20 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster
19/11/19 22:28:20 INFO namenode.FSNamesystem: HA Enabled: true
19/11/19 22:28:20 INFO namenode.FSNamesystem: Append Enabled: true
19/11/19 22:28:21 INFO util.GSet: Computing capacity for map INodeMap
19/11/19 22:28:21 INFO util.GSet: VM type       = 64-bit
19/11/19 22:28:21 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
19/11/19 22:28:21 INFO util.GSet: capacity      = 2^20 = 1048576 entries
19/11/19 22:28:21 INFO namenode.FSDirectory: ACLs enabled? false
19/11/19 22:28:21 INFO namenode.FSDirectory: XAttrs enabled? true
19/11/19 22:28:21 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
19/11/19 22:28:21 INFO namenode.NameNode: Caching file names occuring more than 10 times
19/11/19 22:28:21 INFO util.GSet: Computing capacity for map cachedBlocks
19/11/19 22:28:21 INFO util.GSet: VM type       = 64-bit
19/11/19 22:28:21 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
19/11/19 22:28:21 INFO util.GSet: capacity      = 2^18 = 262144 entries
19/11/19 22:28:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
19/11/19 22:28:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
19/11/19 22:28:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
19/11/19 22:28:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
19/11/19 22:28:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
19/11/19 22:28:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
19/11/19 22:28:21 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
19/11/19 22:28:21 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
19/11/19 22:28:21 INFO util.GSet: Computing capacity for map NameNodeRetryCache
19/11/19 22:28:21 INFO util.GSet: VM type       = 64-bit
19/11/19 22:28:21 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
19/11/19 22:28:21 INFO util.GSet: capacity      = 2^15 = 32768 entries
19/11/19 22:28:24 INFO namenode.FSImage: Allocated new BlockPoolId: BP-109135744-192.168.25.50-1574173704383
19/11/19 22:28:24 INFO common.Storage: Storage directory /var/mgs/hadoop/ha/dfs/name has been successfully formatted.
19/11/19 22:28:25 INFO namenode.FSImageFormatProtobuf: Saving image file /var/mgs/hadoop/ha/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
19/11/19 22:28:25 INFO namenode.FSImageFormatProtobuf: Image file /var/mgs/hadoop/ha/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
19/11/19 22:28:25 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/11/19 22:28:25 INFO util.ExitUtil: Exiting with status 0
19/11/19 22:28:25 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at MDNode01/192.168.25.50
************************************************************/

啟動NameNode程序

[root@MDNode01 ~]# jps
1489 Jps
1382 JournalNode
[root@MDNode01 ~]# hadoop-daemon.sh start namenode
starting namenode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-namenode-MDNode01.out
[root@MDNode01 ~]# jps
1382 JournalNode
1594 Jps
1519 NameNode

在2節點的NameNode從節點上面複製主節點上面的格式化資訊

[root@MDNode02 ~]# hdfs namenode -bootstrapStandby
19/11/19 22:47:21 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = MDNode02/192.168.25.51
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 2.7.5
STARTUP_MSG:   classpath = /opt/mgs/hadoop-2.7.5/etc/hadoop:/opt/mgs/hadoop-2.7.5/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/common/lib/junit-4.11.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mgs/hadoop-
......
2.7.5.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar:/opt/mgs/hadoop-2.7.5/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://[email protected]/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075; compiled by 'kshvachk' on 2017-12-16T01:06Z
STARTUP_MSG:   java = 1.8.0_221
************************************************************/
19/11/19 22:47:21 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/11/19 22:47:21 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: mycluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://MDNode01:50070
  Other NN's IPC  address: MDNode01/192.168.25.50:8020
             Namespace ID: 749034685
            Block pool ID: BP-109135744-192.168.25.50-1574173704383
               Cluster ID: CID-574c1fd8-3458-4039-bc23-29255a0c7333
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
19/11/19 22:47:24 INFO common.Storage: Storage directory /var/mgs/hadoop/ha/dfs/name has been successfully formatted.
19/11/19 22:47:26 INFO namenode.TransferFsImage: Opening connection to http://MDNode01:50070/imagetransfer?getimage=1&txid=0&storageInfo=-63:749034685:0:CID-574c1fd8-3458-4039-bc23-29255a0c7333
19/11/19 22:47:26 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
19/11/19 22:47:27 INFO namenode.TransferFsImage: Transfer took 0.01s at 0.00 KB/s
19/11/19 22:47:27 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 321 bytes.
19/11/19 22:47:27 INFO util.ExitUtil: Exiting with status 0
19/11/19 22:47:27 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at MDNode02/192.168.25.51
************************************************************/
[root@MDNode02 ~]# jps
1827 JournalNode
1758 QuorumPeerMain
1950 Jps
[root@MDNode02 ~]# 

註冊Zookeeper服務

檢視Zookeeper的服務

[root@MDNode04 ~]# zkCli.sh 
Connecting to localhost:2181
2019-11-19 22:56:44,562 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-11-19 22:56:44,566 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=MDNode04
2019-11-19 22:56:44,566 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_221
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_221/jre
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/mgs/zookeeper-3.4.14/bin/../zookeeper-server/target/classes:/opt/mgs/zookeeper-3.4.14/bin/../build/classes:/opt/mgs/zookeeper-3.4.14/bin/../zookeeper-server/target/lib/*.jar:/opt/mgs/zookeeper-3.4.14/bin/../build/lib/*.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/slf4j-log4j12-1.7.25.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/slf4j-api-1.7.25.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/netty-3.10.6.Final.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/log4j-1.2.17.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/jline-0.9.94.jar:/opt/mgs/zookeeper-3.4.14/bin/../lib/audience-annotations-0.5.0.jar:/opt/mgs/zookeeper-3.4.14/bin/../zookeeper-3.4.14.jar:/opt/mgs/zookeeper-3.4.14/bin/../zookeeper-server/src/main/resources/lib/*.jar:/opt/mgs/zookeeper-3.4.14/bin/../conf:.:/usr/java/jdk1.8.0_221/lib:/usr/java/jdk1.8.0_221/jre/lib:
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2019-11-19 22:56:44,569 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2019-11-19 22:56:44,570 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2019-11-19 22:56:44,570 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64
2019-11-19 22:56:44,570 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2019-11-19 22:56:44,570 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2019-11-19 22:56:44,570 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/root
2019-11-19 22:56:44,571 [myid:] - INFO  [main:ZooKeeper@442] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@5ce65a89
Welcome to ZooKeeper!
2019-11-19 22:56:44,671 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1025] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2019-11-19 22:56:44,809 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@879] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2019-11-19 22:56:44,956 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x30000d4fb1c0000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
##可以檢視Zookeeper提供的服務
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] 

在ZooKeeper中初始化HA狀態

新增配置金鑰後,下一步是在ZooKeeper中初始化所需的狀態。您可以通過從其中一個NameNode主機執行以下命令來執行此操作。

$ hdfs zkfc -formatZK

這將在ZooKeeper中建立一個znode,自動故障轉移系統將在其中儲存其資料。

使用start-dfs.sh啟動叢集

由於已在配置中啟用了自動故障轉移,因此start-dfs.sh指令碼現在將在任何執行NameNode的計算機上自動啟動ZKFC守護程式。ZKFC啟動時,它們將自動選擇一個NameNode啟用。

手動啟動叢集

如果您手動管理叢集上的服務,則需要在執行NameNode的每臺計算機上手動啟動zkfc守護程式。您可以通過執行以下命令啟動守護程式:

$ hadoop-daemon.sh start zkfc

註冊NameNode註冊Zookeeper服務

[root@MDNode01 ~]# hdfs zkfc -formatZK
19/11/19 23:00:15 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at MDNode01/192.168.25.50:8020
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:host.name=MDNode01
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_221
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_221/jre
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/mgs/hadoop-2.7.5/etc/hadoop:/opt/mgs/hadoop-
......
2.7.5.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.5.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar:/opt/mgs/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar:/opt/mgs/hadoop-2.7.5/contrib/capacity-scheduler/*.jar
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/mgs/hadoop-2.7.5/lib/native
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:user.name=root
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=MDNode02:2181,MDNode03:2181,MDNode04:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@52bf72b5
19/11/19 23:00:16 INFO zookeeper.ClientCnxn: Opening socket connection to server MDNode03/192.168.25.52:2181. Will not attempt to authenticate using SASL (unknown error)
19/11/19 23:00:16 INFO zookeeper.ClientCnxn: Socket connection established to MDNode03/192.168.25.52:2181, initiating session
19/11/19 23:00:16 INFO zookeeper.ClientCnxn: Session establishment complete on server MDNode03/192.168.25.52:2181, sessionid = 0x20000d50ad80000, negotiated timeout = 5000
19/11/19 23:00:16 INFO ha.ActiveStandbyElector: Session connected.
19/11/19 23:00:16 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
19/11/19 23:00:16 INFO zookeeper.ClientCnxn: EventThread shut down
19/11/19 23:00:16 INFO zookeeper.ZooKeeper: Session: 0x20000d50ad80000 closed

檢視Zookeeper中的服務

[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper, hadoop-ha]
[zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha
[mycluster]
[zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha/mycluster
[]
[zk: localhost:2181(CONNECTED) 4] 

可以看到Hadoop-ha的服務

啟動叢集

[root@MDNode01 ~]# start-dfs.sh 
Starting namenodes on [MDNode01 MDNode02]
MDNode02: starting namenode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-namenode-MDNode02.out
MDNode01: namenode running as process 1519. Stop it first.
MDNode03: starting datanode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-datanode-MDNode03.out
Mdnode04: starting datanode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-datanode-MDNode04.out
MDNode02: starting datanode, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-datanode-MDNode02.out
Starting journal nodes [MDNode01 MDNode02 MDNode03]
MDNode01: journalnode running as process 1382. Stop it first.
MDNode03: journalnode running as process 1745. Stop it first.
MDNode02: journalnode running as process 1827. Stop it first.
Starting ZK Failover Controllers on NN hosts [MDNode01 MDNode02]
MDNode01: starting zkfc, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-zkfc-MDNode01.out
MDNode02: starting zkfc, logging to /opt/mgs/hadoop-2.7.5/logs/hadoop-root-zkfc-MDNode02.out
[root@MDNode01 ~]# jps
2227 DFSZKFailoverController
2324 Jps
1382 JournalNode
1519 NameNode
[root@MDNode02 current]# jps
2050 DataNode
1827 JournalNode
1989 NameNode
2119 DFSZKFailoverController
2266 Jps
1758 QuorumPeerMain
[root@MDNode03 ~]# jps
1745 JournalNode
1944 Jps
1676 QuorumPeerMain
1852 DataNode
[root@MDNode04 ~]# jps
1745 ZooKeeperMain
1665 QuorumPeerMain
1796 DataNode
1886 Jps
[zk: localhost:2181(CONNECTED) 7] ls /hadoop-ha/mycluster 
[ActiveBreadCrumb, ActiveStandbyElectorLock]

檢視配置

ZooKeeper -server host:port cmd args
	stat path [watch]
	set path data [version]
	ls path [watch]
	delquota [-n|-b] path
	ls2 path [watch]
	setAcl path acl
	setquota -n|-b val path
	history 
	redo cmdno
	printwatches on|off
	delete path [version]
	sync path
	listquota path
	rmr path
	get path [watch]
	create [-s] [-e] path data acl
	addauth scheme auth
	quit 
	getAcl path
	close 
	connect host:port
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper, hadoop-ha]
[zk: localhost:2181(CONNECTED) 7] ls /hadoop-ha/mycluster 
[ActiveBreadCrumb, ActiveStandbyElectorLock]
[zk: localhost:2181(CONNECTED) 8] get /hadoop-ha/mycluster/ActiveBreadCrumb

	myclusternnMDNode01 �>(�>
cZxid = 0x300000008
ctime = Tue Nov 19 23:06:56 CST 2019
mZxid = 0x300000008
mtime = Tue Nov 19 23:06:56 CST 2019
pZxid = 0x300000008
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0
[zk: localhost:2181(CONNECTED) 9] get /hadoop-ha/mycluster/ActiveStandbyElectorLock

	myclusternnMDNode01 �>(�>
cZxid = 0x300000007
ctime = Tue Nov 19 23:06:56 CST 2019
mZxid = 0x300000007
mtime = Tue Nov 19 23:06:56 CST 2019
pZxid = 0x300000007
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x30000d4fb1c0001
dataLength = 32
numChildren = 0
[zk: localhost:2181(CONNECTED) 10] 

瀏覽器訪問兩個節點的


服務宕機測試

主節點:

hadoop-daemon.sh stop namenode 關閉主節點的服務

節點2會自動切換成active狀態

可以通過Zookeeper的服務檢視此時的服務是第二個節點

當hadoop的主節點重新啟動,這時候不會切換節點2的Ativan狀態

而1只會成為standby狀態

//被Zookeeper認為宕機的原因很多,切換的機制就是這樣


關閉順序:

從主節點上關閉

[root@hadoopNode01 hadoop]# stop-dfs.sh 
Stopping namenodes on [hadoopNode01 hadoopNode02]
hadoopNode02: stopping namenode
hadoopNode01: stopping namenode
hadoopNode02: stopping datanode
hadoopNode03: stopping datanode
hadoopNode04: stopping datanode
Stopping journal nodes [hadoopNode01 hadoopNode02 hadoopNode03]
hadoopNode02: stopping journalnode
hadoopNode03: stopping journalnode
hadoopNode01: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [hadoopNode01 hadoopNode02]
hadoopNode01: stopping zkfc
hadoopNode02: stopping zkfc
[root@hadoopNode01 hadoop]# 

關閉

Zookeeper服務

zkServer.shstop


下一次重新啟動的一次順序

1、啟動Zookeeper叢集

zkServer.sh start

2、直接啟動hdfs

start-dfs.sh