hadoop&hbase壞道檢查和處理之東湖現場
阿新 • • 發佈:2019-01-24
今天遇到一個問題,hbase客戶端寫入hbase報錯如下:
hbase 後臺報錯ERROR: Region { meta => ***, hdfs => hdfs://***, deployed => } not deployed on any region server.
2016-01-20 15:52:31,079 AsyncProcess$AsyncRequestFutureImpl.resubmit:1144 INFO #14508512, table=tr_image, attempt=26/35 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region tr_image,A21ML90210111\x00\x00\x01Q,1451765854574.21820d2ed2a501a99300f2c74367d954. <span style="background-color: rgb(255, 0, 0);">is not online on host110,16020,1453007077717</span> at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2740) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:859) at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1795) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31313) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) on host110,16020,1453007077717, tracking started null, retrying after=20022ms, replay=1ops
上網找找問題發現可能是meta(hbase元資料)資訊有錯誤,好吧,我們 使用命令檢視一下hbase的狀態 命令為"hbase hbck",輸入關鍵內容如下:
看日誌因該是 meta中記錄的regsion在server中找不到了。598d0b620b41, negotiated timeout = 40000 2016-01-20 15:54:02,964 INFO [main] zookeeper.ZooKeeper: Session: 0x152598d0b620b41 closed 2016-01-20 15:54:02,965 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2016-01-20 15:54:02,965 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x152598d0b620b40 2016-01-20 15:54:02,967 INFO [main] zookeeper.ZooKeeper: Session: 0x152598d0b620b40 closed 2016-01-20 15:54:02,967 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down <span style="background-color: rgb(255, 0, 0);">ERROR: Region { meta => tr_image,A21ML90210111\x00\x00\x01Q,1451765854574.21820d2ed2a501a99300f2c74367d954., hdfs => hdfs://cluster1/hbase/data/default/tr_image/21820d2ed2a501a99300f2c74367d954, deployed => } not deployed on any region server. ERROR: Region { meta => tr_image,AQ9E560210571\x00\x00\x01Q4,1452975417206.2ec3471d3f10eed3087842233b5ec5a1., hdfs => hdfs://cluster1/hbase/data/default/tr_image/2ec3471d3f10eed3087842233b5ec5a1, deployed => } not deployed on any region server. ERROR: Region { meta => tr_image,AX1G770210431\x00\x00\x01Q$\x9C,1451991127089.91a2e0eac438482edb75685a9f5d3efa., hdfs => hdfs://cluster1/hbase/data/default/tr_image/91a2e0eac438482edb75685a9f5d3efa, deployed => } not deployed on any region server.</span> 2016-01-20 15:54:03,158 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. 2016-01-20 15:54:03,159 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. 2016-01-20 15:54:03,159 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. 2016-01-20 15:54:03,159 INFO [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially. ERROR: There is a hole in the region chain between A21ML90210111\x00\x00\x01Q and A21RL60210181\x00\x00\x01QG. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between AQ9E560210571\x00\x00\x01Q4 and AQ9T950210401\x00\x00\x01P\xB6. You need to create a new .regioninfo and region dir in hdfs to plug the hole. ERROR: There is a hole in the region chain between AX1G770210431\x00\x00\x01Q$\x9C and AX1H320210571\x00\x00\x01P\xB4. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
百度一下吧,查到一個文章入連線:
那我就試一試唄,執行了一個命令恢復meta “hbase hbck -fixMeta -fixAssignments”
命令返回資訊看見個這
util.HBaseFsck: Sleeping 10000ms before re-checking after fix...
要成功嗎???
再看使用hbase客戶端報錯~~消失了。