1. 程式人生 > >問題:sbin/start-dfs.sh 啟動不了zookeeper,zkfc和datenode

問題:sbin/start-dfs.sh 啟動不了zookeeper,zkfc和datenode

問題一:zookeeper服務啟動成功,但是就不顯示埠號,未執行,配置也正確,就是不知道什麼原因 執行zookeeperd後顯示啟動成功:

JMX enabled by default Using config: /data/programfiles/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED

但用zkServer.sh status檢視,反饋如下:

JMX enabled by default Using config: /data/programfiles/zookeeper-3.4.5/bin/../conf/zoo.cfg Error contacting service. It is probably not running.

可能原因:1.埠被佔,無法啟動 2.myid有問題,沒有或者被清除

解決:1.修改埠號,再嘗試 2.重新設定myid,並驗證是否正確

問題二:sbin/start-dfs.sh  啟動不了zkfc 檢視zkfc的log日誌提醒: 2018-09-20 22:41:00,073 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server s1.hadoop/192.168.197.130:2181. Will not attempt to authenticate using SASL (unknown error) 2018-09-20 22:41:00,074 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: 拒絕連線     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2018-09-20 22:41:00,375 ERROR org.apache.hadoop.ha.ActiveStandbyElector: Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds 2018-09-20 22:41:01,176 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server nn2.hadoop/192.168.197.133:2181. Will not attempt to authenticate using SASL (unknown error) 2018-09-20 22:41:01,280 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed 2018-09-20 22:41:01,280 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2018-09-20 22:41:01,302 FATAL org.apache.hadoop.ha.ZKFailoverController: Unable to start failover controller. Unable to connect to ZooKeeper quorum at nn1.hadoop:2181,nn2.hadoop:2181,s1.hadoop:2181. Please check the configured value for ha.zookeeper.quorum and ensure that ZooKeeper is running.

3:19 2018/9/23 原因:是因為沒有啟動zookeeper,導致拒絕連線,不能啟動zkfc. 啟動順序為:zookeeper--> start-dfs.sh 特別注意:單獨啟動start-dfs.sh是無法啟動zkfc的!

解決方法:先啟動zookeeper,然後再啟動start-dfs.sh就行了!

突破點分析: 搜尋日誌問題無果,想到問題核心是:啟動zkfc,於是想到了解:有關zkfc的配置, 接著懷疑zookeeper和hdfs的啟動先後順序,是一起,還是有先後? 於是搜尋:啟動zkfc前需要啟動zookeeper嗎?找到下面文章證實疑問! 參考文章: HA 模式下的 Hadoop+ZooKeeper+HBase 啟動順序 https://blog.csdn.net/u011414200/article/details/50437356  

問題三: datanode 無法啟動? hadoop-daemons.sh start datanode    顯示已啟動,但是檢視jps未啟動! 原因之一: 21:11 2018/9/19 問題:開啟journalnode,在/usr/local/hadoop/logs/檔案下檢視datanode的日誌檔案發現: java.io.IOException: the path component: '/data' is owned by a user who is not root and not you.  Your effective user id is 0; the path is owned by user id 1001, and its permissions are 0755.  Please fix this or select a different socket path.

java.io.IOException: the path component: '/data' is world-writable.  Its permissions are 0777.  Please fix this or select a different socket path.

原因:此處是因為data檔案的歸屬和許可權問題,設定不正確,所以由此提示 解決:將data設定為:hadoop:hadoop,許可權設定為:755

原因之二: 2018-09-20 11:46:08,798 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/data/dfs/ java.io.IOException: Incompatible clusterIDs in /data/dfs: namenode clusterID = CID-9500c2d3-5b6b-40b9-bcb1-50ccabe17c7e; datanode clusterID = CID-dab21165-cfc9-4145-b412-299eacd0db67

原因:namenode clusterID與datanode clusterID 不一樣,導致不相容,需要將datanode clusterID後者改為和前者一樣! 解決:將/data/dfs/current/VERSION中的datanode clusterID改為namenode clusterID即可!