ZooKeeper相關錯誤的解決
阿新 • • 發佈:2018-12-25
一、錯誤1
1.1、錯誤描述
ZooKeeper Server(“FOLLOWER和LEADER”都有)的日誌中顯示有以下所示錯誤:
2016-05-14 15:33:01,818 [myid:2] - ERROR [CommitProcessor:2:NIOServerCnxn@178] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio .ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
va:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
java:1081)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest (Fina
lRequestProcessor.java:170)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
1.2、錯誤原因分析
ZooKeeper Server傳送回覆時,Socket連線已經被關閉。
1.3、錯誤解決
當ZooKeeper Server傳送回覆時,增加一個“sk.isValid()”的判斷。以上其實是一個bug,在ZooKeeper 3.4.8版本中得到修復。
1.4、其他
這個錯誤在上線“使用ZooKeeper獲取MQ地址方案”之前也存在。
二、錯誤2
2.1、錯誤描述
ZooKeeper Server(“FOLLOWER”)日誌中顯示有以下所示錯誤,出現該錯誤後,作為“FOLLOWER”的該ZooKeeper Server在一段時間內會停止工作:
2016-05-15 04:04:40,569 [myid:1] - WARN [SyncThread:1:[email protected]] - fsync-ing the write ahead log in SyncThread:1 took 2243ms which will adversely effect operation latency. See the
ZooKeeper troubleshooting guide
2016-05-14 15:32:50,764 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2016-05-14 15:32:50,764 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
相應的ZooKeeper Server(“LEADER”)日誌中顯示有如下所示錯誤:
2016-05-14 15:32:42,605 [myid:3] - WARN [SyncThread:3:[email protected]] - fsync-i
ng the write ahead log in SyncThread:3 took 3041ms which will adversely effect o
peration latency. See the ZooKeeper troubleshooting guide
2016-05-14 15:32:50,764 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:218
1:LearnerHandler@687] - Closing connection to peer due to transaction timeout.
2016-05-14 15:32:50,764 [myid:3] - WARN [LearnerHandler-/10.110.20.23:39390:Lea
rnerHandler@646] - ******* GOODBYE /10.110.20.23:39390 ********
2016-05-14 15:32:50,764 [myid:3] - WARN [LearnerHandler-/10.110.20.23:39390:Lea
rnerHandler@658] - Ignoring unexpected exception
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterrup
tibly(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantL
ock.java:312)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java
:294)
at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHan
dler.java:656)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.
java:649)
2.2、錯誤原因分析
“FOLLOWER”在跟“LEADER”同步時,fsync操作時間過長,導致超時。
2.3、錯誤解決
增加“tickTime”或者“initLimit和syncLimit”的值,或者兩者都增大。
2.4、其他
這個錯誤在上線“使用ZooKeeper獲取MQ地址方案”之前也存在,只不過沒有這麼高頻率,而上線了“使用ZooKeeper獲取MQ地址方案”之後,ZooKeeper Server之間的同步資料量增大,ZooKeeper Server的負載加重,因而最終導致高頻率出現上述錯誤。