datanode Unhealthy Nodes

阿新 • • 發佈：2018-11-11

簡介：今天單機啟動hadoop發現，datanode存在，但是頁面上卻顯示Unhealthy Nodes

1.現象

在這裡插入圖片描述

2.原因

當磁碟數少於一定量時，會把這臺機器變成unhealthy，將不會再給這臺機器分配任務。


yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage	90	The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
yarn.nodemanager.disk-health-checker.min-healthy-disks	0.25	The minimum fraction of number of disks to be healthy for the nodemanager to launch new containers. This correspond to both yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there are less number of healthy local-dirs (or log-dirs) available, then new containers will not be launched on this node.

只要不足25%的磁碟少於90%磁碟使用量，就會不再分配container，防止中間結果和日誌沒有空間，該節點就Unhealthy了；

解決方法：
1.清理磁碟，比如我的磁碟原來剩下6G,總共120G,清理磁碟後，變成15G
2.解決方案-重啟相關服務
2.1 重啟nodemanager：

 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh stop nodemanager
 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh start nodemanager

2.2 重啟resourcemanager，(否則會導致修改的節點狀態錯亂)

 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh stop resourcemanager
 /usr/local/goldmine/hadoop/default/sbin/yarn-daemon.sh start resourcemanager

2.3 重新整理 http://localhost:8088/cluster/nodes 頁面：
可以看到不健康的nodemanager已經消失在列表了。
2.4 命令顯示yarn各節點狀態：

[email protected] ~$     yarn node -list -all
18/10/17 22:08:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/17 22:08:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Total Nodes:1
         Node-Id	     Node-State	Node-Http-Address	Number-of-Running-Containers
 localhost:53758	        RUNNING	   localhost:8042	                           0
[email protected] ~$

datanode Unhealthy Nodes

簡介：今天單機啟動hadoop發現，datanode存在，但是頁面上卻顯示Unhealthy Nodes 1.現象 2.原因當磁碟數少於一定量時，會把這臺機器變成unhealthy，將不會再給這臺機器分配任務。

Hadoop 跑MapReduce Job 的時候卡主,unhealthy-nodes=1

出現跑 MR job的時候卡主,根被就沒有執行map 和reduce,修改yarn-site.xml的記憶體也不管用,檢視8088介面,發現yarn的Memory Total =0B,Active Nodes =0,而Unhealthy Nodes =1,字面意

hadoop Unhealthy Nodes問題解決

1、問題來源前幾天因為一個hive SQL的問題，導致其中一臺機器的磁碟空間不足，刪除臨時檔案解決了空間不足的問題；檢視http://hadoop/cluster/nodes/unhealthy 發現出現了一個unhealty節點，錯誤資訊如下 1/1 local-d

Hadoop datanode正常啟動，但是Live nodes中卻突然缺少節點

tar ade clas pos body bce href 12g class 熱h9燦秤擅樸r5廈氯仿素慚馱什澆俾腿諶nr哉認贍http://blog.sina.com.cn/s/blog_172d23f1e0102wy3m.html亮ci冒缸習劣qy攣頹凹煌用仝較露導

執行時候報異常could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) r

執行時候報異常could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operati

hadoop偽分散式下無法啟動datanode的原因及could only be replicated to > 0 nodes, instead of 1的錯誤

目前發現一個原因是因為datanode無法啟動從而導致在hadoop上 put資料出現 could only be replicated to > 0 nodes, instead of 1的錯誤。輸入jps發現唯獨只有datanode程序沒有起來。所以我們要想

[LintCode] 618 Search Graph Nodes 解題報告

and value integer search values tee esc mapping ppi DescriptionGiven a undirected graph, a node and a target, return the nearest node to

重新格式化hadoop的namenode導致datanode無法啟動的最簡單解決辦法

導致 roo 報錯信息不一致 atan 格式化 exceptio nco ava 一般namenode只格式化一次，重新格式化不僅會導致之前的數據都不可用，而且datanode也會無法啟動。在datanode日誌中會有類似如下的報錯信息： java.io.IOExcep

leetcode鏈表--15、everse-nodes-in-k-group（按照k值進行k個結點的逆序）

逆序 ever alter 解題思路 for chang 所有 node weight 題目描述 Given a linked list, reverse the nodes of a linked list k at a time and return its mod

leetcode鏈表--16、swap-nodes-in-pairs（成對交換鏈表結點）

ive push 返回 pre head 交換 while const int 題目描述 Given a linked list, swap every two adjacent nodes and return its head. For example, Given

light oj 1094 Farthest Nodes in a Tree（樹的直徑模板）

fas problems tro tree enter have () 分享 color 1094 - Farthest Nodes in a Tree PDF (English) Statistics Forum Time Limit: 2

leetcode:Reverse Nodes in k-Group

pre ont con i++ ava variable ret sta pos Reverse Nodes in k-Group Given a linked list, reverse the nodes of a linked list k a

[Leetcode] Binary search -- 222. Count Complete Tree Nodes

lee term dfs target key odi exc ogr ava Given a complete binary tree, count the number of nodes. Definition of a complete binary tree fro

add nodes to the swarm

docker swarm一旦你們創建了一個帶有管理節點的swarm集群，你就可以開始添加 worker節點$ docker-machine ssh worker1 $ docker swarm join --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb

10_How swarm mode works-How nodes work

docker swarm 使用一個或者多個 docker engine1.12 創建一個集群，叫做 swarm.一個swarm 包含一個或者多個節點：物理節點或者運行 docker engine1.12的虛擬節點總共有兩者類型的節點： managers 和 workersManager n

Count Complete Tree Nodes

scrip right tree == node light ++ htm tco https://leetcode.com/problems/count-complete-tree-nodes/#/description http://www.cnblogs.com/Ed

dfs.datanode.max.transfer.threads

prop XML data bound tran read code hat pre An HDFS DataNode has an upper bound on the number of files that it will serve at any one time:

LeetCode222——Count Complete Tree Nodes

mil otto and position lan clu pos pes vertical Given a complete binary tree, count the number of nodes. Definition of a complete

HDU 4587 TWO NODES（割兩個點的最大連通分支數）

target int 兩個 bsp printf pan sta ans acm http://acm.hdu.edu.cn/showproblem.php?pid=4587 題意：給一圖，求割去兩個點後所能形成的最大連通分支數。思路：對於這種情況，第一

leetcode--(24. Swap Nodes in Pairs)

app modify spa ext div not != -- algorithm 描述： Given a linked list, swap every two adjacent nodes and return its head. For example,Given

datanode Unhealthy Nodes

1.現象

2.原因

相關推薦