1. 程式人生 > >Hadoop NameNode、DataNode熱遷移方案

Hadoop NameNode、DataNode熱遷移方案

開發十年,就只剩下這套架構體系了! >>>   

最近我們生產環境的Hadoop叢集需要調整幾臺伺服器,具體轉換關係如下:

datanode92.bi -> namenode02.bi
namenode01.bi(old) -> datanode19.bi
namenode02.bi -> datanode20.bi

最終目標為:

  • datanode92.binamenode01.bi
    伺服器上的DataNode服務下線
  • 由於namenode02.bi伺服器配置較低,因此將namenode02.bi伺服器上的NameNode服務下線,將其遷移到配置更高的datanode92.bi伺服器上
  • 利用閒置資源,將老的空閒的namenode01.bi(old)伺服器調整為DataNode伺服器
  • 調整完後,增加datanode19.bidatanode20.bi兩個伺服器名稱

一、DataNode下線

1、在namenode01上,新增退役節點的IP到黑名單,/usr/local/hadoop-2.6.3/etc/hadoop/dfs.exclude檔案新增如下需要下線的伺服器主機名稱:

datanode92.bi
namenode01.bi

2、配置hdfs-site.xml

<property>
    <name>dfs.hosts.exclude</name>
    <value>/usr/local/hadoop-2.6.3/etc/hadoop/dfs.exclude</value>
</property>

3、切換到/usr/local/hadoop-2.6.3/bin目錄下,執行如下命令,重新整理NN節點,不需要重啟NN服務:

./hdfs dfsadmin -refreshNodes

執行命令後,需要及時檢視NameNode日誌驗證命令是否執行成功,hadoop-hadoop-namenode-namenode01.bi.log

輸出日誌如下:

Refresh nodes successful for namenode01.bi.10101111.com/10.216.2.25:8020
Refresh nodes successful for namenode02.bi.10101111.com/10.216.5.2:8020

4、檢視spaceX,觀察狀態在decommisstion in progress的節點:

同時,還要檢視需要進行拷貝的block副本數,也就是Number of Under-Replicated Blocks這個指標的值:

5、當所有要退役的節點狀態都為Decommissioned,且Number of Under-Replicated Blocks數值為0,表明資料遷移工作已經完成

6、從slaves檔案中清空退役節點

7、DN下線完成後,spaceX報磁碟滿的警告,經過檢視整個叢集空間使用量及剩餘量資訊,證實可以忽略此警告:

(1)警告資訊:

(2)磁碟剩餘量17.46%

二、NameNode遷移

1、停止服務:

  • 停止namenode02.bi上的 namenodezkfcjournalnoderesourcemanager 服務,使namenode02.bi上的修改記錄資料和元資料不再更新:
/usr/local/hadoop-2.6.3/sbin/hadoop-daemon.sh stop namenode
/usr/local/hadoop-2.6.3/sbin/hadoop-daemon.sh stop zkfc
/usr/local/hadoop-2.6.3/sbin/hadoop-daemon.sh stop journalnode
  • 停止Hive服務:查出hiveserver2metastore的程序號,直接kill

2、通知運維更新網路資訊:

  • 更新DNS資訊
  • 修改主機名稱,datanode92.bi的主機名稱修改為namenode02.bi的主機名稱,並同步更新Hadoop叢集伺服器的host檔案、slaves檔案

3、拷貝資料:

  • NameNode的整個Hadoop目錄拷貝到目標機器的/usr/local路徑下
  • /data/dfs的元資料拷貝到目標機器的/data/dfs路徑下
scp -r dfs [email protected]:/data/dfs/
  • /usr/local/apache-hive-1.2.1-bin拷貝到目標機器的/usr/local路徑下

4、啟動服務:

(1)檢查配置檔案,在目標機器上啟動namenodezkfcjournalnode服務。以下是NameNode服務啟動後的塊彙報日誌,可以看到大量帶有BlockStateChange關鍵字的日誌,這樣的日誌列印沒完成,就證明NameNode沒有完全正常,不能對外提供服務,日誌如下:

2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.26:50010 is added to blk_2880883080_1896501829{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2297011b-a18a-449b-b795-248b44d082eb:NORMAL:10.216.5.26:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.21:50010 is added to blk_2880883080_1896501829{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2297011b-a18a-449b-b795-248b44d082eb:NORMAL:10.216.5.26:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-e5c6c2d5-81ae-45ee-a204-f3e3313756fe:NORMAL:10.216.4.21:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.40:50010 is added to blk_2880883080_1896501829{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2297011b-a18a-449b-b795-248b44d082eb:NORMAL:10.216.5.26:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-e5c6c2d5-81ae-45ee-a204-f3e3313756fe:NORMAL:10.216.4.21:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-0e5e6e27-8744-4e6a-8006-690f6e099676:NORMAL:10.216.5.40:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.21:50010 is added to blk_2880883081_1896501830{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-b4850b87-338c-45cc-b917-b6250aa70370:NORMAL:10.216.2.21:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.4:50010 is added to blk_2880883081_1896501830{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-b4850b87-338c-45cc-b917-b6250aa70370:NORMAL:10.216.2.21:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-fa7fa32c-91bb-47a3-a559-2fc31f74cdec:NORMAL:10.216.4.4:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.48:50010 is added to blk_2880883081_1896501830{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-b4850b87-338c-45cc-b917-b6250aa70370:NORMAL:10.216.2.21:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-fa7fa32c-91bb-47a3-a559-2fc31f74cdec:NORMAL:10.216.4.4:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-df7f2d4a-0b76-46ee-9e30-d42921ba825a:NORMAL:10.216.5.48:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.21:50010 is added to blk_2880883082_1896501831{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e5c6c2d5-81ae-45ee-a204-f3e3313756fe:NORMAL:10.216.4.21:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.19:50010 is added to blk_2880883082_1896501831{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e5c6c2d5-81ae-45ee-a204-f3e3313756fe:NORMAL:10.216.4.21:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-2c37fdd9-9d89-493c-bb0d-b915b9fd0476:NORMAL:10.216.4.19:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,795 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.41:50010 is added to blk_2880883082_1896501831{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e5c6c2d5-81ae-45ee-a204-f3e3313756fe:NORMAL:10.216.4.21:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-2c37fdd9-9d89-493c-bb0d-b915b9fd0476:NORMAL:10.216.4.19:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-91a83f26-10a5-4456-94ec-460d5a428990:NORMAL:10.216.4.41:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,796 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.34:50010 is added to blk_2880883083_1896501832{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5540b823-7606-432a-98c9-a8d84c2c3e72:NORMAL:10.216.2.34:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,796 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.8:50010 is added to blk_2880883083_1896501832{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5540b823-7606-432a-98c9-a8d84c2c3e72:NORMAL:10.216.2.34:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-110ef02a-b044-41bf-9d65-f5fdaa76b959:NORMAL:10.216.2.8:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,796 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.8:50010 is added to blk_2880883083_1896501832{blockUCState=UNDER_CONSTRUCTION, prima849 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.14:50010 is added to blk_2880883130_1896501879{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e80ba307-0231-4811-92af-33fac386ea09:NORMAL:10.216.5.14:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,863 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 132651/150565 transactions completed. (88%)
2019-03-12 16:20:43,879 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.38:50010 is added to blk_2880883136_1896501885{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9dec2025-66f8-4b4a-a197-e56751b284d5:NORMAL:10.216.4.38:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,893 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.5:50010 is added to blk_2880883140_1896501889{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-45cbbbda-e17f-4835-bf47-7487c3b3a456:NORMAL:10.216.5.5:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,893 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.16:50010 is added to blk_2880883140_1896501889{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-45cbbbda-e17f-4835-bf47-7487c3b3a456:NORMAL:10.216.5.5:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-32c95765-9f02-4863-8ba2-f1fecf2e0872:NORMAL:10.216.5.16:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,893 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.20:50010 is added to blk_2880883140_1896501889{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-45cbbbda-e17f-4835-bf47-7487c3b3a456:NORMAL:10.216.5.5:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-32c95765-9f02-4863-8ba2-f1fecf2e0872:NORMAL:10.216.5.16:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-32001683-8bf1-4ae1-baf4-0efd7d117c94:NORMAL:10.216.2.20:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,904 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.38:50010 is added to blk_2880883143_1896501892{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9dec2025-66f8-4b4a-a197-e56751b284d5:NORMAL:10.216.4.38:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,905 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.23:50010 is added to blk_2880883144_1896501893{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-68b7e245-3c92-4f81-95fe-dcb87e2453a8:NORMAL:10.216.2.23:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,905 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.22:50010 is added to blk_2880883144_1896501893{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-68b7e245-3c92-4f81-95fe-dcb87e2453a8:NORMAL:10.216.2.23:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-6b7b44fb-539f-4a1c-a867-1a0d7fe77972:NORMAL:10.216.5.22:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,912 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.14:50010 is added to blk_2880883145_1896501894{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e80ba307-0231-4811-92af-33fac386ea09:NORMAL:10.216.5.14:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,912 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.22:50010 is added to blk_2880883145_1896501894{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e80ba307-0231-4811-92af-33fac386ea09:NORMAL:10.216.5.14:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-7f939532-7a64-4700-815d-65afeab75c63:NORMAL:10.216.2.22:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,917 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.22:50010 is added to blk_2880883147_1896501896{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-7f939532-7a64-4700-815d-65afeab75c63:NORMAL:10.216.2.22:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,919 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.14:50010 is added to blk_2880883151_1896501900{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e80ba307-0231-4811-92af-33fac386ea09:NORMAL:10.216.5.14:50010|RBW]]} size 0
2019-03-12 16:20:43,919 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.3:50010 is added to blk_2880883152_1896501901{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-bffbb1b4-09d5-486c-8493-13953032e792:NORMAL:10.216.5.3:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,924 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.5:50010 is added to blk_2880883158_1896501907{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-45cbbbda-e17f-4835-bf47-7487c3b3a456:NORMAL:10.216.5.5:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,936 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.14:50010 is added to blk_2880883160_1896501909{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-e80ba307-0231-4811-92af-33fac386ea09:NORMAL:10.216.5.14:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,947 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.5.3:50010 is added to blk_2880883164_1896501913{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-bffbb1b4-09d5-486c-8493-13953032e792:NORMAL:10.216.5.3:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,957 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.20:50010 is added to blk_2880883165_1896501914{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-32001683-8bf1-4ae1-baf4-0efd7d117c94:NORMAL:10.216.2.20:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,957 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.21:50010 is added to blk_2880883165_1896501914{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-32001683-8bf1-4ae1-baf4-0efd7d117c94:NORMAL:10.216.2.20:50010|FINALIZED], ReplicaUnderConstruction[[DISK]DS-b4850b87-338c-45cc-b917-b6250aa70370:NORMAL:10.216.2.21:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,957 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.48:50010 is added to blk_2880883166_1896501915{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-cc0c9e65-7223-4a96-9066-544d069b4f7c:NORMAL:10.216.4.48:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,957 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.23:50010 is added to blk_2880883167_1896501916{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-68b7e245-3c92-4f81-95fe-dcb87e2453a8:NORMAL:10.216.2.23:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,960 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.4.38:50010 is added to blk_2880883169_1896501918{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9dec2025-66f8-4b4a-a197-e56751b284d5:NORMAL:10.216.4.38:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,960 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.39:50010 is added to blk_2880883171_1896501920{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-645de8ff-8245-4bc8-b21b-0a1f05190d90:NORMAL:10.216.2.39:50010|FINALIZED]]} size 0
2019-03-12 16:20:43,960 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.216.2.9:50010 is added to blk_2880883179_1896501928{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9056dc2d-4567-4539-991b-f0c24bcf190d:NORMAL:10.216.2.9:50010|FINALIZED]]} size 0

也可以通過HDFSWeb UIstartup介面看到NameNode載入元資料的進度,當startup頁面顯示100%時,表明NameNode載入元資料完成:

(2)啟動Hive服務:

#啟動hiveserver2
nohup hive --service hiveserver2 &
#啟動metastore
nohup hive --service metastore &

5、啟動resourcemanager

6、去掉namenode01裡的exclude配置

三、DataNode上線

1、通知運維修改伺服器主機名稱,並更新host

namenode01.bi(old) -> datanode19.bi
namenode02.bi -> datanode20.bi

2、在/usr/local/hadoop-2.6.3/etc/hadoop/dfs.include檔案中新增新節點名稱:

datanode19.bi
datanode20.bi

3、在hdfs-site.xml檔案中新增屬性:

<property>
    <name>dfs.hosts</name>
    <value>/usr/local/hadoop-2.6.3/etc/hadoop/dfs.include</value>
</property>

4、在NN上重新整理節點,不需要重啟NN服務:

hdfs dfsadmin -refreshNodes

5、在slaves檔案中新增新節點主機名,並同步更新到所有叢集伺服器:

datanode19.bi
datanode20.bi

6、單獨在新節點的機器上啟動新節點上的DataNode

/usr/local/hadoop-2.6.3/sbin/hadoop-daemon.sh start datanode

四、問題記錄及參考資料

NameNode熱遷移方案

如何在不影響hadoop叢集正常執行的情況下遷移主控節點[namenode]

Jps介紹以及解決jps無法檢視某個已經啟動