1. 程式人生 > >線上問題排查-HBase寫數據出現NotServingRegionException(Region ... is not online)異常

線上問題排查-HBase寫數據出現NotServingRegionException(Region ... is not online)異常

cti method rec tail warn current 程序 執行 too

今天線上遇到一個問題:有一臺服務器的cpu持續沖高,排查發現是我們的一個java應用進程造成的,該進程在向hbase中寫入數據時,日誌不斷地打印下面的異常:

org.apache.hadoop.hbase.NotServingRegionException: Region iot_flow_cdr_201811,4379692584601-2101152593-20181115072326-355,1536703383699.82804f639798d0502dd64e6e47d75d84. is not online on shqz-ps-iot3-cdr-dn01,60020,1524812940505
      at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2921)
      at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1053)
      at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2096)
      at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
      at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
      at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
      at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
      at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
      at java.lang.Thread.run(Thread.java:745)

排查思路如下:

  1. 查看hbase的請求數量是否過高:通過hbase的web控制界面查看RegionServer的請求數,如下圖
    技術分享圖片
    可以看到,Request Per Second並不高,排除這個原因。
  2. 檢查表iot_flow_cdr_201811信息是否正常

(1) 檢查該表是否存在一致性問題

hbase hbck -details iot_flow_cdr_201811

確實發現了不一致的異常

8 inconsistencies detected

(2) 嘗試修復該問題

hbase hbck -repair iot_flow_cdr_201811

執行該命令出現下述錯誤

18/11/15 11:28:15 WARN util.HBaseFsck: Got AccessDeniedException when preCheckPermission 
org.apache.hadoop.hbase.security.AccessDeniedException: Permission denied: action=WRITE path=hdfs://nameservice1/hbase/.hbase-snapshot user=root
        at org.apache.hadoop.hbase.util.FSUtils.checkAccess(FSUtils.java:1797)
        at org.apache.hadoop.hbase.util.HBaseFsck.preCheckPermission(HBaseFsck.java:1932)
        at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:4734)
        at org.apache.hadoop.hbase.util.HBaseFsck$HBaseFsckTool.run(HBaseFsck.java:4562)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:4550)
Current user root does not have write perms to hdfs://nameservice1/hbase/.hbase-snapshot. Please rerun hbck as hdfs user hbase

根據提示可以看到,錯誤原因是沒有權限Permission denied
然後我們以hbase用戶身份執行該命令

sudo - hbase hbase hbck -repair iot_flow_cdr_201811

這次執行成功了,等命令執行完成後,修復了inconsistencies(數據不一致)的錯誤。
最後重啟應用,觀察日誌,程序正常執行,NotServingRegionException異常不再出現了,服務器cpu也恢復了正常。

線上問題排查-HBase寫數據出現NotServingRegionException(Region ... is not online)異常