1. 程式人生 > >hadoop故障一例

hadoop故障一例

2014-07-21 10:12:31,098 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node node-xxx-40:50010 is attempting to report storage ID DS-1137532894-192.168.2.40-50010-1400206530880. Node 192.168.2.40:50010 is expected to serve this storage.

彷彿是ip變更引發的問題。仔細一問,有同事手工做過伺服器內部檔案的複製,估計複製有問題。

只有按經典辦法,刪除相應目錄,重啟datanode.

處理完畢.重啟datanode,又不行。

2014-07-21 13:46:43,108 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.
RemoteException: java.io.IOException: verifyNodeRegistration: unknown datanode node-114-40:50010
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyNodeRegistration(FSNamesystem.ja
va:4743)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:253
8)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:1013)
	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

	at org.apache.hadoop.ipc.Client.call(Client.java:1107)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
	at com.sun.proxy.$Proxy5.register(Unknown Source)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:740)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1549)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1609)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)


到namenode上,在/etc/hadoop/exclude裡去掉該節點。

然後執行,sudo -u hdfs hadoop dfsadmin -refreshNodes  。