1. 程式人生 > >Hadoop常見錯誤及解決辦法彙總

Hadoop常見錯誤及解決辦法彙總

錯誤一:java.io.IOException: Incompatible clusterIDs 時常出現在namenode重新格式化之後
' V0 h# C5 a8 o+ ^1 n
* w- t  k( ]$ }+ p6 d  G2014-04-29 14:32:53,877 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
0 X- V* {) S1 z+ s2 O) F( |

java.io.IOException: Incompatible clusterIDs in /data/dfs/data: namenode clusterID = CID-d1448b9e-da0f-499e-b1d4-78cb18ecdebb; datanode clusterID = CID-ff0faa40-2940-4838-b321-98272eb0dee3; B: L* y6 ]/ ~! k7 S1 A5 r; X
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)( ~7 a- j) {- I! S! _

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)0 C7 s  U/ j) c$ h! v% G
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)( g- `) h9 h2 Q9 w
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)5 ]' Q, f% v! I, a  ^

        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
! y2 J; j' l0 g* {        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
* w% E- ]- Q$ F        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)- J2 f9 R$ |* c* i% [: \$ T
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
! E7 K& b0 i' M, q        at java.lang.Thread.run(Thread.java:722)
" {3 u, B& D$ u. w; S. a8 n- H2014-04-29 14:32:53,885 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
  g4 R# [4 n, h: M# p9 Z7 F0 B1 _; \2014-04-29 14:32:53,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421)
. Z: Q1 s- [2 @& l2014-04-29 14:32:55,897 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode8 O% w0 f# O1 p% u; H2 H* j
. {# ?) L( G$ U4 r

. u6 w/ y% X9 F- W3 N' v  ]; k原因:每次namenode format會重新建立一個namenodeId,而data目錄包含了上次format時的id,namenode format清空了namenode下的資料,但是沒有清空datanode下的資料,導致啟動時失敗,所要做的就是每次fotmat前,清空data下的所有目錄.
* A8 D% r" T- j; O; [- \  h: |7 \. r$ i
解決辦法:停掉叢集,刪除問題節點的data目錄下的所有內容。即hdfs-site.xml檔案中配置的dfs.data.dir目錄。重新格式化namenode。# K+ I6 l. J' B8 U" U6 |: p

8 u0 m5 u& @- J% C
( D9 J* p# P* Q* q/ f4 g( K: Z另一個更省事的辦法:先停掉叢集,然後將datanode節點目錄/dfs/data/current/VERSION中的修改為與namenode一致即可。
1 Q  w* G7 Z  o' q0 @: c/ m, t: r$ f6 `

錯誤二:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container
- y7 \$ e4 a4 {. y
% u$ V0 R* }1 r4 ~- m- X14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to: Application application_1398704073313_0021 failed 2 times due to Error launching appattempt_1398704073313_0021_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
/ B; {7 {, {" F; y9 Y3 e8 a" c& zThis token is expired. current time is 1398762692768 found 1398711306590
+ P" `1 _& e: c        at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source)  ~- }8 Y1 }* q  B4 e
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
8 d1 y. v5 }" s/ O+ o        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
' m  D2 O- l+ t( i        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
6 a; {( s, i4 u6 k* {, G6 z' {        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
2 F7 t$ F1 X3 w; O        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
2 @; S% P( ~+ }        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
" u6 ]/ ]* O# k$ n        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  D3 \. i- j1 i- ]% M. b2 ~' o  \        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
" ~& m" }* K2 t% F# }/ W# v5 C        at java.lang.Thread.run(Thread.java:722)
: p) _4 i  k' s! B3 f, l3 G. Failing the application.
! D* z% R5 j# [( t, w) J3 c14/04/29 02:45:07 INFO mapreduce.Job: Counters: 05 ]# |- B  x2 d/ X* |; r

7 W/ r8 U1 v  n& a# P' P問題原因:namenode,datanode時間同步問題9 d5 X  q  l" h; D8 w

1 [9 C( g9 ~3 _% ~& `( _. q解決辦法:多個datanode與namenode進行時間同步,在每臺伺服器執行:ntpdate time.nist.gov,確認時間同步成功。$ J3 w6 I4 p1 k# |* ~
最好在每臺伺服器的 /etc/crontab 中加入一行:" m: Z! N6 C2 H
0 2 * * * root ntpdate time.nist.gov && hwclock -w

錯誤:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write' L; r8 G1 W: P. F0 S# }5 s
2014-05-06 14:28:09,386 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCK operation  src: /192.168.1.191:48854 dest: /192.168.1.191:50010, M5 B- {' e' V% H/ e' L
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010 remote=/192.168.1.191:48854]; h6 K. d: O+ |. f% F* g: k
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
! v8 \4 F( t+ }' S        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
  C. k0 S7 a1 u. L" p& X( E        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
0 P9 O: \. c8 R. |7 f        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
3 t8 d, J" [3 N$ Z$ |) x        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
8 H% c; o$ ?$ z0 V* U6 N        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
# T) ~* H; c! v4 b        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)& z: T1 ]9 z$ @" j1 ^+ @
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)% ]! ]7 g" U) v1 H
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
1 E( r  e' u4 u  d8 v! U        at java.lang.Thread.run(Thread.java:722)2 H# B. D  K8 I. F9 a
, F- F4 y2 {/ z0 U
原因:IO超時9 W. N3 j; A4 q

3 X' Y$ u2 z' J% N$ _& P解決方法:
) C( i0 S- e: L- W# O* O修改hadoop配置檔案hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout兩個屬性的設定。! f$ D" a' l7 [, }" h
    <property>
5 _% |) {$ t2 L* [/ j        <name>dfs.datanode.socket.write.timeout</name>
! y3 O! w$ |: r. [, G1 S& ~        <value>6000000</value>
7 @. m- j) g: ~6 {# ]5 U3 M    </property>2 o1 |1 q+ {) ]

6 [! m3 {! V$ |% F3 X) U* p    <property>4 m9 t  ^+ @4 M& `9 s
        <name>dfs.socket.timeout</name>' ]; J1 O' R) w1 @4 h- n8 x
        <value>6000000</value>
4 t; ~1 P# N( W" Q) k) ^    </property>
- u8 K8 U( b$ ^6 ?
' N# ]2 Q+ _  y! Z注意: 超時上限值以毫秒為單位。0表示無限制。) z: _

錯誤:DataXceiver error processing WRITE_BLOCK operation+ \# Z- e; a" W' g
2014-05-06 15:21:30,378 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing WRITE_BLOCK operation  src: /192.168.1.193:34147 dest: /192.168.1.191:50010
: `' O; I1 e0 I5 s0 s) `, W& t% ~0 Bjava.io.IOException: Premature EOF from inputStream
! y) K# @' ?0 s. v/ X" @        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
( f3 E/ B2 k, N5 w/ I        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)- t) y4 g+ x7 m" {5 ^& D1 _0 G4 K
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
! I' Z: b1 ^) z1 ~6 @) b: i        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
) |. M: }( C/ g6 |6 p" t! w        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)# Y% V, }; s1 s6 @, V% |1 [- G
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693)
8 y* ]( ~( j9 K5 d        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)* T; M# o( D+ o, K
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
/ ?" v  B$ [8 c: |; D: L3 I$ I        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)7 m6 z0 _, x9 m
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221). }9 x3 ?8 U3 S4 S
        at java.lang.Thread.run(Thread.java:722)
/ K* I5 u" L( P+ i, m* C
) k. s$ ~9 j0 E- y* Q! q原因:檔案操作超租期,實際上就是data stream操作過程中檔案被刪掉了。
  ]' u$ N% R+ c# J, c3 ^  C& y2 I3 s6 m! f* z& y8 `
解決辦法:; S# h/ ~7 u9 W' G0 m. O* ~
修改hdfs-site.xml (針對2.x版本,1.x版本屬性名應該是:dfs.datanode.max.xcievers):
' l4 f) `; y4 B, l: K- ^+ D: W<property> * _5 H; ]: D+ _/ i
        <name>dfs.datanode.max.transfer.threads</name> 5 q8 B8 D2 g- c# v; ~
        <value>8192</value>   H9 V1 w# N9 {' W
</property>
$ B& N5 N* k( v& F2 [( p  E0 M拷貝到各datanode節點並重啟datanode即可

錯誤:java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.1 r6 m& X# [: S3 e) i& G# \! p6 K
2014-05-07 12:21:41,820 WARN [Thread-115] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed 4 L( [) w: N" v, K5 O1 i7 B! c( C
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.) J- d% x/ Q, c! Z9 G- W: B2 {
        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)( g# y5 T4 q/ G) @5 [; m  q
        at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
6 Y" m/ G+ n& c! J. C        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)$ H3 w- d4 `# X
        at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
1 i- e( ]) m' N# e, S        at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
6 j. p. M* v8 h3 U% a2 u        at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)& n: m% a% R' _% e/ ~  K1 n
        at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)! @& p& G5 C8 ^+ ~; `) \+ j; k) W
        at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
! p6 C8 a# d: p" v. @, \; V2 S+ w: I        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
0 k; n% d1 h/ k7 n3 ?6 B        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)9 f% ]0 i1 ^) _; b
Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.) v! J: P9 O/ u0 G
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:860)
/ r+ ?2 c- H8 {        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:925)
* Y$ [9 y9 i! y. A) Z        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
! v1 h7 E( K$ X5 ^        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
5 C# d0 f$ ?( A& \+ a: E        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475); h" V6 }% k4 N" k& h. I! E: q

( t1 Y# l3 R1 u! W原因:無法寫入;我的環境中有3個datanode,備份數量設定的是3。在寫操作時,它會在pipeline中寫3個機器。預設replace-datanode-on-failure.policy是DEFAULT,如果系統中的datanode大於等於3,它會找另外一個datanode來拷貝。目前機器只有3臺,因此只要一臺datanode出問題,就一直無法寫入成功。) G8 `4 c; D7 A$ k/ A3 o7 @
+ }" U7 b* c/ S  z9 K
解決辦法:修改hdfs-site.xml檔案,新增或者修改如下兩項:
* }% z' Z7 `6 b  w$ M- \6 ?<property>: e' l+ F9 U4 S% {& \( t
  <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
; w) e' Q; R, }- K. j  <value>true</value>
5 }6 c' c% }1 D; ^0 b</property>
1 ^9 D4 d; t4 ^' i, K" M) i<property>
6 ?2 g4 h" F5 ^$ ^  <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
- t- ?+ w. t2 p& ~  <value>NEVER</value>
9 H, {* ]0 z! y" W</property>7 Y, O" I# P0 z: J+ p  i4 c' o: h
1 m' H: |4 S7 A7 M
對於dfs.client.block.write.replace-datanode-on-failure.enable,客戶端在寫失敗的時候,是否使用更換策略,預設是true沒有問題。
6 T4 f+ L, a4 C  a/ a. t對於,dfs.client.block.write.replace-datanode-on-failure.policy,default在3個或以上備份的時候,是會嘗試更換結點嘗試寫入datanode。而在兩個備份的時候,不更換datanode,直接開始寫。對於3個datanode的叢集,只要一個節點沒響應寫入就會出問題,所以可以關掉。

錯誤:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for : n# b7 O/ H5 x1 l, M
14/05/08 18:24:59 INFO mapreduce.Job: Task Id : attempt_1399539856880_0016_m_000029_2, Status : FAILED: q' I9 z6 w2 W( k
Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1399539856880_0016_m_000029_2_spill_0.out' C/ H, v+ }, B4 H2 F
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)# r" w  u4 Y( N# H2 ]# }
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
  S$ ^! `- s  W; D8 s5 L  I( A& a        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)( I! [1 e' D4 M6 \  D4 q
        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)$ V8 D+ h# C6 G8 b* f5 y1 v" [
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)) g9 ~+ Q. @8 h& _" X6 d% o6 K
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467): W/ \* N9 X* i9 N" }* B
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
1 w3 G1 @, c9 _& f8 g/ I" E( P        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)( C, C/ Y" l8 F0 H+ _* v
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
+ l: C7 \% B: p. F) f1 Z* r7 a0 D        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)) W  R4 h7 o. r  b% Z) Y- |
        at java.security.AccessController.doPrivileged(Native Method)1 j- v' e' `: l3 Z- [8 B' i
        at javax.security.auth.Subject.doAs(Subject.java:415)$ H; K, A" T$ {# Y& b. V
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)1 |. f- k0 V% ]# J: [6 a% m
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)  f7 C& i8 j( m- |: H( U) U, c
) T7 t; Q8 U8 ]+ P; @6 H0 V
Container killed by the ApplicationMaster./ P& L' \6 M, J2 d$ J/ b5 I7 L

* A! [1 ?6 |" d9 _0 N1 D原因:兩種可能,hadoop.tmp.dir或者data目錄儲存空間不足。
5 y0 Z/ W/ o0 C3 ]( f% @4 ?& X# g( _2 X
解決辦法:看了一下我的dfs狀態,data使用率不到40%,所以推測是hadoop.tmp.dir空間不足,導致無法建立Jog臨時檔案。檢視core-site.xml發現沒有配置hadoop.tmp.dir,因此使用的是預設的/tmp目錄,在這目錄一旦伺服器重啟資料就會丟失,因此需要修改。新增:
5 x5 }5 K8 V" |/ z  Y! j. z/ A8 r. O, v0 Z<property>. F4 a" v2 [- K3 ~
<name>hadoop.tmp.dir</dir>8 [) P  Z6 r- l7 ]4 b
<value>/data/tmp</value>
* S5 k$ C; @9 s3 {% _( x</property># s! y% H2 F  q9 `. ~. m
然後重新格式化:hadoop namenode -format
4 p" T1 S& s7 o- F* c: R% r重啟。

2014-06-19 10:00:32,181 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for [email protected]( X: i6 h$ U: Z6 z. y5 Y& u/ `) f
java.io.IOException: Spill failed
3 W, `! J" @2 w, S        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
$ M' f1 i0 K# Z$ z/ e% B1 z; m        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)" x* C& E) p- t
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)$ b( @; I* H4 D8 U$ [  ]
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
& P6 T' a) S7 i" x3 b2 N) K$ V        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
: t. r! h6 }, L1 o; X! g* T6 T: X1 `        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)  T" ~* z- u+ ~
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)! e/ ~. [3 E4 \; Y* L
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
) `2 c* v  C) l9 o, n- N  z        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)1 r! t, I. V: x) g% x/ H
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
! Y: j( F% G! ^! E- d- V5 ]        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
0 H7 m; d) k  m) q( [: Z        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  I1 b( z1 H. B7 N0 V6 Q        at java.lang.Thread.run(Thread.java:722)
% g& i4 A' l" R0 i, z' e& hCaused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
6 B# F) X; @( U; d) N        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
5 T  y8 U; n; `# F        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
1 M  @9 V0 F. A: {        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)& Q. D9 N/ C0 k/ l7 h
        at org.apache.hadoop.mapred.MROutputFiles.getSpillFileForWrite(MROutputFiles.java:146)
" m4 {3 s9 D' J& C/ S- C& W        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
5 p2 Z4 B+ R5 i5 E4 W& M7 r        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852): P0 N) S. j0 o% t3 ^. Q) K
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)4 f- X  I& k# [3 M" I4 M

: S9 S: D3 s* F, k5 ^: k: [: I
. `9 ^+ [8 T0 B# ~* @( F' x錯誤原因:本地磁碟空間不足非hdfs (我是在myeclipse中除錯程式,本地tmp目錄佔滿)  ~6 ]6 m- i- R6 O1 x4 {+ j/ [
解決辦法:清理、增加空間

2014-06-23 10:21:01,479 INFO [IPC Server handler 3 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1403488126955_0002_m_000000_0 is : 0.308017162014-06-23 10:21:01,512 FATAL [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1403488126955_0002_m_000000_0 - exited : java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,513 INFO [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,514 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180)        at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:415)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1403488126955_0002_m_000000_0_spill_53.out        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)        at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852)        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,516 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1403488126955_0002_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP8 T- ?6 g: ~/ K4 p# R  B+ F
錯誤很明顯,磁碟空間不足,但鬱悶的是,進各節點檢視,磁碟空間使用不到40%,還有很多空間。
4 w* ?+ R: q. x6 Y) e9 a# S) z! ?' l& Q$ _- {  W2 k
鬱悶很長時間才發現,原來有個map任務執行時輸出比較多,執行出錯前,硬碟空間一路飆升,直到100%不夠時報錯。隨後任務執行失敗,釋放空間,把任務分配給其它節點。正因為空間被釋放,因此雖然報空間不足的錯誤,但檢視當時磁碟還有很多剩餘空間。
- @# h6 ~& F4 ]! ^  t& a' t# f; Y6 @& W' g& A, F. O+ s" l8 _
這個問題告訴我們,執行過程中的監控很重要。

相關推薦

Hadoop常見錯誤解決辦法彙總

錯誤一:java.io.IOException: Incompatible clusterIDs 時常出現在namenode重新格式化之後' V0 h# C5 a8 o+ ^1 n* w- t  k( ]$ }+ p6 d  G2014-04-29 14:32:53,877

在myeclipse中使用maven前常見錯誤解決辦法

eclips aam lai loj yate ren err jar ued %E4%BD%BF%E7%94%A8CHttpFile%E4%BB%8E%E6%9C%8D%E5%8A%A1%E5%99%A8%E7%AB%AF%E6%AD%A3%E7%A1%AE%E7%9A%

MVC MVC常見錯誤解決辦法

.cn log entity ash cnblogs vid services ron strong MVC常見錯誤及解決辦法 問題1: 必須添加對程序集“EntityFramework, Version=5.0.0.0, Culture=neutral,

Nginx常見錯誤解決辦法

1、Nginx 常見啟動錯誤  有的時候初次安裝nginx的時候會報這樣的錯誤  sbin/nginx -c conf/nginx.conf  報錯內容:sbin/nginx: error while&nbs

loadrunner的幾種常見錯誤解決辦法

1、錄製loadrunner指令碼時,無法生成指令碼,錄製過程中事物都有顯示,我把所有防火牆和網路保護都關了,但是指令碼就是無法生成,協議之類的都是對的。  解決: 在錄製選項option->network->     ort Map

ios 常見錯誤解決辦法(不定時更新)

這類錯誤是因為將專案拷貝到新的電腦造成的錯誤(原因是專案名稱不同造成的),解決辦法:更改Build Setting中的專案名稱就好了。    Build Setting ->Product Name  未完待續。。。。

laravel本地專案常見錯誤解決辦法

1.nginx報錯 require(/data/www/dongmeiwei.com/bootstrap/../vendor/autoload.php): failed to open stream: No such file or directory in /data/w

SQL安裝常見錯誤解決辦法

錯誤1:安裝sql2008出現不是有效的安裝資料夾解決方法:找到安裝檔案所在的資料夾,然後點選裡面的“Setup.exe”進行安裝,便可以安裝了。之前提示安裝路徑的原因我在安裝時,點選的是開始選單欄裡的“安裝中心”進行安裝的,所以出現了提示資訊。錯誤2:sql server2

VC6.0編譯器常見錯誤解決辦法

1.建立一個Win32 Application,建立一個C++ souce file,輸入以下程式碼: #include <windows.h> #include <stdio.h> int main() { pr

Altium Designer編譯常見錯誤解決辦法

錯誤型別         Fatal error:重大錯誤;         Error:錯誤;   &

Oracle常見錯誤解決方案彙總

(原創作者:陳玓玏) 這篇文章主要是記錄一些工作中常常會碰到的錯誤跟解決方案,彙總到一起,方便查詢。 1、 錯誤提示ORA-00933: SQL command not properly ended in? 可能的原因: 1)語句寫的順序不正確,

linux下安裝oracle10g常見錯誤解決辦法

  linux下安裝oracle,在安裝自檢過程中可能會有一系列錯誤和警告出現,以下是我安裝過程中遇到的一些問題,以及解決辦法。 一、 Checking Network Configuration requirements ... 不能通過 可能錯誤原因1: 未指定固定IP

啟動Pro/TOOLKIT程式的常見錯誤解決辦法

一、發生讀取錯誤’No such file or directory’ 錯誤原因:註冊檔案(一般為protk.dat)中exec_file指定的dll路徑不正確。 解決辦法:將註冊檔案中exec_file指定的路徑修改正確。 可以通過【輔助應用程式視窗

DB2常見錯誤解決辦法

建立資料庫的時候,報42704錯誤。如: Sql程式碼 =>createdatabase test    =>SQL0204N  "SYSTEM_1386_US"is an undefined name.   SQLSTATE=42704   =>create database tes

Oracle Data Pump 工具系列:Data Pump 許可權配置相關錯誤解決辦法彙總

與 Data Pump 許可權相關的錯誤及解決辦法: 示例語句: > expdp scott/tiger DIRECTORY=my_dir DUMPFILE=exp_s.dmp \  LOGFILE=exp_s.log SCHEMAS=scott 錯誤1: UDE-00008: operation ge

CSS常見相容性問題解決辦法彙總

我們都知道,不同版本瀏覽器對css的解析是有些分別的,特別是IE6,和IE7.雖然現在使用老版本的人數不多,但是還是有部分人在使用,我們並不能完全忽略這群使用者。如下所示:我們還是應該瞭解一下這些瀏覽器相容問題。 問題一:在IE6元素浮動,如果寬度需要內容

net-snmp常見的兩個錯誤解決辦法

第一個: Warning: noaccess control information configured.   It's unlikely this agent can serve any usefulpurpose in this state.   Run "snmpc

QT、VS常見bug解決辦法(四)——錯誤    1    error LNK2019: 無法解析的外部符號

問題1:無法解析的外部符號 "void __cdecl cv::fastFree(void *)" 錯誤 1 error LNK2019: 無法解析的外部符號 "void __cdecl cv::fastFree(void *)" ([email protecte

QT、VS常見bug解決辦法(二)——VS錯誤 2 error LNK1120: 1 個無法解析的外部命令

這個問題的原因是有標頭檔案,但是找不到實現。 有兩個原因: 1.只包含了標頭檔案,只有這個函式的宣告,沒有包含這個函式的實現(實現一般放在cpp檔案中的)。所以只能通過編譯,連線不成功。 2.另一個原因是函式的宣告和實現都放在標頭檔案中了,一般要把宣告放標頭檔案中,實現放在c

java web 常見異常解決辦法

eset log const 鍵值 cannot tomcat action asp tex javax.servlet.ServletException: javax/servlet/jsp/SkipPageException 重啟tomcat, javax.serv