1. 程式人生 > >執行hbase時zookeepersession超時問題:KeeperErrorCode = Session expired for /hbase/master

執行hbase時zookeepersession超時問題:KeeperErrorCode = Session expired for /hbase/master

工作中執行hbase報的zookeeper異常資訊:
2013-06-28 18:26:59,946 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)

    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)

    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
2013-06-28 18:26:59,946 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Received unexpected KeeperException, re-throwing exception

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
2013-06-28 18:26:59,947 ERROR org.apache.hadoop.hbase.master.ActiveMasterManager: master:60000-0x33f7eb9d0650002-0x33f7eb9d0650002-0x33f7eb9d0650002 Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
    at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)

以下異常原因以及解決方法為摘抄,待驗證。

異常原因:

hbase中和GC相關的引數:

修改前(預設):

export HBASE_OPTS="$HBASE_OPTS -ea -verbose:gc -Xloggc:$HBASE_LOG_DIR/hbase.gc.log -XX:ErrorFile=$HBASE_LOG_DIR/hs_err_pid.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

諮詢開發修改後:

export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xloggc:$HBASE_LOG_DIR/hbase.gc.log -XX:ErrorFile=$HBASE_LOG_DIR/hs_err_pid.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70"

-XX UseConcMarkSweepGC :設定年老代為併發收集。(新老都有)

老:-XX:+CMSIncrementalMode :設定為增量模式。適用於單CPU情況。

新:-XX:+UseParNewGC:設定年輕代為並行收集。可與 CMS 收集同時使用。

-XX:CMSInitiatingOccupancyFraction=70:這個引數是我覺得產生最大作用的。因為最終的目的是減少FULL GC,因為full gc是會block其他執行緒的。

預設觸發GC的時機是當年老代記憶體達到90%的時候,這個百分比由 -XX:CMSInitiatingOccupancyFraction=N 這個引數來設定。concurrent mode failed發生在這樣一個場景:
當年老代記憶體達到90%的時候,CMS開始進行併發垃圾收集,於此同時,新生代還在迅速不斷地晉升物件到年老代。當年老代CMS還未完成併發標記時,年老 代滿了,悲劇就發生了。CMS因為沒記憶體可用不得不暫停mark,並觸發一次全jvm的stop the world(掛起所有執行緒),然後採用單執行緒拷貝方式清理所有垃圾物件,也就是full gc。而我們的bulk的最開始的操作就是各種刪表,建表頻繁的操作,就會使用掉大量master的年輕代的記憶體,就會發生上面發生的場景,發生full gc。

解決辦法:CMSInitiatingOccupancyFraction=70表示年老代佔到約70%時就開始執行CMS,這樣就不會出現(或很少出現)Full GC了。