1. 程式人生 > >datanode Unhealthy Nodes

datanode Unhealthy Nodes

簡介:今天單機啟動hadoop發現,datanode存在,但是頁面上卻顯示Unhealthy Nodes

1.現象

在這裡插入圖片描述

在這裡插入圖片描述

2.原因

磁碟數少於一定量時,會把這臺機器變成unhealthy,將不會再給這臺機器分配任務。

yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage 90 The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
yarn.nodemanager.disk-health-checker.min-healthy-disks 0.25 The minimum fraction of number of disks to be healthy for the nodemanager to launch new containers. This correspond to both yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there are less number of healthy local-dirs (or log-dirs) available, then new containers will not be launched on this node.

只要不足25%的磁碟少於90%磁碟使用量,就會不再分配container,防止中間結果和日誌沒有空間,該節點就Unhealthy了;

解決方法:
1.清理磁碟,比如我的磁碟原來剩下6G,總共120G,清理磁碟後,變成15G
2.解決方案-重啟相關服務
2.1 重啟nodemanager:

 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh stop nodemanager
 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh start nodemanager

2.2 重啟resourcemanager,(否則會導致修改的節點狀態錯亂)

 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh stop resourcemanager
 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh start resourcemanager

2.3 重新整理 http://localhost:8088/cluster/nodes 頁面:
可以看到不健康的nodemanager已經消失在列表了。
2.4 命令顯示yarn各節點狀態:

[email protected] ~$     yarn node -list -all
18/10/17 22:08:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/17 22:08:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:1
         Node-Id	     Node-State	Node-Http-Address	Number-of-Running-Containers
 localhost:53758	        RUNNING	   localhost:8042	                           0
[email protected] ~$