1. 程式人生 > >HDFS讀取數據發生異常

HDFS讀取數據發生異常

apache line 可能 mat all 其他 程序 egrep cnblogs

當HDFS某個或者某幾個datanode被關閉,並且這期間一直有數據在寫入HDFS時,HDFS上某些block可能會發生HDFS租約問題,導致在一定時間期限內,其他應用程序(MR、spark、hive等)無法讀取該block數據而拋出異常,異常如下:

17/07/28 14:13:40 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 40.0 (TID 2777, dcnode5): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1594711030-10.29
.180.177-1497607441986:blk_1073842908_103050; getBlockSize()=24352; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.241.104.148:50010,DS-2a2e3731-7889-4572-ac03-1645cb9681f5,DISK], DatanodeInfoWithStorage[10.28.142.158:50010,DS-40c6a66e-4f6f-4061-8a54-ac1a8874e3e1,DISK], DatanodeInfoWithStorage[10.28.142.143:50010
,DS-41399e02-856b-4761-af41-c916986bd400,DISK], DatanodeInfoWithStorage[10.28.142.37:50010,DS-4bb951a2-6963-4f24-ac80-4df64e0b5d99,DISK]]} at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:427) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:
335) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:263) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1565) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:309) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:305) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:305) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:778) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

關於租約的詳細情況,可以從此鏈接進行了解:http://www.cnblogs.com/cssdongl/p/6699919.html

此時這些block並不是已經損壞了,只是租約未釋放,導致其他程序無法讀寫,我們可以將其租約恢復或者暴力的直接刪除該文件;

恢復租約的方法比較麻煩,在這裏我只介紹如何找到這些block並且將其刪除:

找出HDFS指定目錄下,有哪些block因為租約問題而無法讀寫:(註意:例子中給出的是HDFS根目錄地址“/”,請根據實際情況替換)

hadoop fsck / -openforwrite | egrep -v ^\.+$ | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//"

刪除這些block

hadoop fsck / -openforwrite | egrep -v ^\.+$ | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//" | xargs -i hadoop fs -rmr {};

HDFS讀取數據發生異常