1. 程式人生 > >zookeeper叢集無法正確啟動

zookeeper叢集無法正確啟動

今天網上認識一妹子讓我幫著解決問題,人家很信任的把自己的伺服器賬號給我了,所以花了一個晚上幫著解決。

首先配置檔案:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper/data
clientPort=2181
server.0=47.94.204.115:2888:3888
server.1=47.94.192.253:2888:3888
server.2=47.94.199.37:2888:3888

然後是:啟動日誌大面積異常:

2017-07-05 23:40:14,814 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888
java.net.ConnectException: 拒絕連線 (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
        at java.lang.Thread.run(Thread.java:745)


然後就是啟動不起來,

再說解決辦法:一波三折

首先我到47.94.192.253伺服器上去檢視netstat -nalp|java 發現埠如下


2181是zookeeper客戶端連線的埠,所以程序號32143啟動起來的,監聽37271埠,但是zookeeper沒有配置這個埠,而是配置2888,3888埠,正常情況下作為follower的時候是3888埠監聽中,用於選舉leader通訊。出現這個情況不得而知。重新啟動該程序,上面一個埠號在不斷的變化。至此問題是找到了,就是服務端程序沒有監聽配置的3888埠,而是監聽了隨機埠導致其它伺服器程序無法與之通訊,所以看到了這個異常。

那麼出現隨機監聽埠的原因要找到才能解決這個問題。我再次把日誌檔案重新打開發現在開頭有這麼一個異常:

2017-07-05 23:40:14,695 [myid:] - INFO  [main:QuorumPeerConfig@134] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
2017-07-05 23:40:14,713 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.192.253 to address: /47.94.192.253
2017-07-05 23:40:14,713 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.204.115 to address: /47.94.204.115


2017-07-05 23:40:14,714 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: 47.94.199.37 to address: /47.94.199.37
2017-07-05 23:40:14,714 [myid:] - INFO  [main:QuorumPeerConfig@396] - Defaulting to majority quorums
2017-07-05 23:40:14,721 [myid:0] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-07-05 23:40:14,725 [myid:0] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-07-05 23:40:14,725 [myid:0] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-07-05 23:40:14,741 [myid:0] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
2017-07-05 23:40:14,751 [myid:0] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-07-05 23:40:14,776 [myid:0] - INFO  [main:QuorumPeer@1134] - minSessionTimeout set to -1
2017-07-05 23:40:14,776 [myid:0] - INFO  [main:QuorumPeer@1145] - maxSessionTimeout set to -1
2017-07-05 23:40:14,777 [myid:0] - INFO  [main:QuorumPeer@1419] - QuorumPeer communication is not secured!
2017-07-05 23:40:14,778 [myid:0] - INFO  [main:QuorumPeer@1448] - quorum.cnxn.threads.size set to 20
2017-07-05 23:40:14,793 [myid:0] - INFO  [ListenerThread:QuorumCnxManager$Listener@739] - My election bind port: /47.94.204.115:3888
2017-07-05 23:40:14,794 [myid:0] - ERROR [/47.94.204.115:3888:QuorumCnxManager$Listener@763] - Exception while listening
java.net.BindException: 無法指定被請求的地址 (Bind failed)
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.ServerSocket.bind(ServerSocket.java:375)
        at java.net.ServerSocket.bind(ServerSocket.java:329)

        at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:742)
2017-07-05 23:40:14,807 [myid:0] - INFO  [QuorumPeer[myid=0]/0.0.0.0:2181:QuorumPeer@865] - LOOKING
2017-07-05 23:40:14,808 [myid:0] - INFO  [QuorumPeer[myid=0]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id =  0, proposed zxid=0x2
2017-07-05 23:40:14,810 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@600] - Notification: 1 (message format version), 0 (n.leader), 0x2 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state)
2017-07-05 23:40:14,814 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@588] - Cannot open channel to 1 at election address /47.94.192.253:3888
java.net.ConnectException: 拒絕連線 (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433)
        at java.lang.Thread.run(Thread.java:745)

前面有一個繫結異常,一般來說出現這個異常的是很常見的2種原因:

1.埠被佔用

2.ip地址不是本機網絡卡。

剛剛看了,3888埠沒有被佔用,那麼出現的原因就是第二個了,

使用ifconfig命令檢視得到如下結果:



果然是第一個原因,不存在這個網絡卡。可能有的朋友就要問了,問什麼通過ssh這個ip地址能登入上來呢、原因很簡單,這是雲伺服器,雲伺服器採用虛擬化的技術,監聽的網絡卡是屬於物理閘道器的網絡卡,而虛擬化機內部自然沒有這個網絡卡。

這個時候真正的原因找到了,解決辦法就是讓伺服器程序監聽0.0.0.0的ip地址,也就是監聽所有網絡卡。

怎麼辦呢,官網上翻了翻沒找到這個配置說明。於是把zookeeper的原始碼拷貝過來。找到QuorumCnxManager.java:742行


發現前邊有一個listenOnAllIPs這個引數,如果他是true,那麼問題就解決了。於是向上級跟蹤。找到QuorumPeerConfig.java中


很明顯了,配置檔案有一個quorumListenOnAllIPs引數指定為true


問題就解決了。


伺服器監聽埠3888了,為所有節點增加配置項,問題得到解決