Hadoop常見錯誤及解決辦法彙總
錯誤一:java.io.IOException: Incompatible clusterIDs 時常出現在namenode重新格式化之後
' V0 h# C5 a8 o+ ^1 n
* w- t k( ]$ }+ p6 d G2014-04-29 14:32:53,877 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed
for block pool Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
0 X- V* {) S1 z+ s2 O) F( |
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)( ~7 a- j) {- I! S! _
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)0 C7 s U/ j) c$ h! v% G
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)( g- `) h9 h2 Q9 w
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)5 ]' Q, f% v! I, a ^
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
! y2 J; j' l0 g* { at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
* w% E- ]- Q$ F at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)- J2 f9 R$ |* c* i% [: \$ T
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
! E7 K& b0 i' M, q at java.lang.Thread.run(Thread.java:722)
" {3 u, B& D$ u. w; S. a8 n- H2014-04-29 14:32:53,885 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421) service to hadoop-master/192.168.1.181:9000
g4 R# [4 n, h: M# p9 Z7 F0 B1 _; \2014-04-29 14:32:53,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1480406410-192.168.1.181-1398701121586 (storage id DS-167510828-192.168.1.191-50010-1398750515421)
. Z: Q1 s- [2 @& l2014-04-29 14:32:55,897 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode8 O% w0 f# O1 p% u; H2 H* j
. {# ?) L( G$ U4 r
. u6 w/ y% X9 F- W3 N' v ]; k原因:每次namenode format會重新建立一個namenodeId,而data目錄包含了上次format時的id,namenode format清空了namenode下的資料,但是沒有清空datanode下的資料,導致啟動時失敗,所要做的就是每次fotmat前,清空data下的所有目錄.
* A8 D% r" T- j; O; [- \ h: |7 \. r$ i
解決辦法:停掉叢集,刪除問題節點的data目錄下的所有內容。即hdfs-site.xml檔案中配置的dfs.data.dir目錄。重新格式化namenode。# K+ I6 l. J' B8 U" U6 |: p
8 u0 m5 u& @- J% C
( D9 J* p# P* Q* q/ f4 g( K: Z另一個更省事的辦法:先停掉叢集,然後將datanode節點目錄/dfs/data/current/VERSION中的修改為與namenode一致即可。
1 Q w* G7 Z o' q0 @: c/ m, t: r$ f6 `
錯誤二:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container
- y7 \$ e4 a4 {. y
% u$ V0 R* }1 r4 ~- m- X14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to: Application application_1398704073313_0021 failed 2 times due to Error launching appattempt_1398704073313_0021_000002.
Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
/ B; {7 {, {" F; y9 Y3 e8 a" c& zThis token is expired. current time is 1398762692768 found 1398711306590
+ P" `1 _& e: c at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source) ~- }8 Y1 }* q B4 e
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
8 d1 y. v5 }" s/ O+ o at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
' m D2 O- l+ t( i at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
6 a; {( s, i4 u6 k* {, G6 z' { at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
2 F7 t$ F1 X3 w; O at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
2 @; S% P( ~+ } at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
" u6 ]/ ]* O# k$ n at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
D3 \. i- j1 i- ]% M. b2 ~' o \ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
" ~& m" }* K2 t% F# }/ W# v5 C at java.lang.Thread.run(Thread.java:722)
: p) _4 i k' s! B3 f, l3 G. Failing the application.
! D* z% R5 j# [( t, w) J3 c14/04/29 02:45:07 INFO mapreduce.Job: Counters: 05 ]# |- B x2 d/ X* |; r
7 W/ r8 U1 v n& a# P' P問題原因:namenode,datanode時間同步問題9 d5 X q l" h; D8 w
1 [9 C( g9 ~3 _% ~& `( _. q解決辦法:多個datanode與namenode進行時間同步,在每臺伺服器執行:ntpdate time.nist.gov,確認時間同步成功。$ J3 w6 I4 p1 k# |* ~
最好在每臺伺服器的 /etc/crontab 中加入一行:" m: Z! N6 C2 H
0 2 * * * root ntpdate time.nist.gov && hwclock -w
錯誤:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write' L; r8 G1 W: P. F0 S# }5 s
2014-05-06 14:28:09,386 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCK operation src: /192.168.1.191:48854
dest: /192.168.1.191:50010, M5 B- {' e' V% H/ e' L
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010 remote=/192.168.1.191:48854]; h6 K. d: O+ |. f% F* g: k
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
! v8 \4 F( t+ }' S at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
C. k0 S7 a1 u. L" p& X( E at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
0 P9 O: \. c8 R. |7 f at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
3 t8 d, J" [3 N$ Z$ |) x at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
8 H% c; o$ ?$ z0 V* U6 N at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
# T) ~* H; c! v4 b at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)& z: T1 ]9 z$ @" j1 ^+ @
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)% ]! ]7 g" U) v1 H
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
1 E( r e' u4 u d8 v! U at java.lang.Thread.run(Thread.java:722)2 H# B. D K8 I. F9 a
, F- F4 y2 {/ z0 U
原因:IO超時9 W. N3 j; A4 q
3 X' Y$ u2 z' J% N$ _& P解決方法:
) C( i0 S- e: L- W# O* O修改hadoop配置檔案hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout兩個屬性的設定。! f$ D" a' l7 [, }" h
<property>
5 _% |) {$ t2 L* [/ j <name>dfs.datanode.socket.write.timeout</name>
! y3 O! w$ |: r. [, G1 S& ~ <value>6000000</value>
7 @. m- j) g: ~6 {# ]5 U3 M </property>2 o1 |1 q+ {) ]
6 [! m3 {! V$ |% F3 X) U* p <property>4 m9 t ^+ @4 M& `9 s
<name>dfs.socket.timeout</name>' ]; J1 O' R) w1 @4 h- n8 x
<value>6000000</value>
4 t; ~1 P# N( W" Q) k) ^ </property>
- u8 K8 U( b$ ^6 ?
' N# ]2 Q+ _ y! Z注意: 超時上限值以毫秒為單位。0表示無限制。) z: _
錯誤:DataXceiver error processing WRITE_BLOCK operation+ \# Z- e; a" W' g
2014-05-06 15:21:30,378 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing WRITE_BLOCK operation src: /192.168.1.193:34147
dest: /192.168.1.191:50010
: `' O; I1 e0 I5 s0 s) `, W& t% ~0 Bjava.io.IOException: Premature EOF from inputStream
! y) K# @' ?0 s. v/ X" @ at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
( f3 E/ B2 k, N5 w/ I at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)- t) y4 g+ x7 m" {5 ^& D1 _0 G4 K
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
! I' Z: b1 ^) z1 ~6 @) b: i at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
) |. M: }( C/ g6 |6 p" t! w at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:435)# Y% V, }; s1 s6 @, V% |1 [- G
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:693)
8 y* ]( ~( j9 K5 d at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:569)* T; M# o( D+ o, K
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
/ ?" v B$ [8 c: |; D: L3 I$ I at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)7 m6 z0 _, x9 m
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221). }9 x3 ?8 U3 S4 S
at java.lang.Thread.run(Thread.java:722)
/ K* I5 u" L( P+ i, m* C
) k. s$ ~9 j0 E- y* Q! q原因:檔案操作超租期,實際上就是data stream操作過程中檔案被刪掉了。
]' u$ N% R+ c# J, c3 ^ C& y2 I3 s6 m! f* z& y8 `
解決辦法:; S# h/ ~7 u9 W' G0 m. O* ~
修改hdfs-site.xml (針對2.x版本,1.x版本屬性名應該是:dfs.datanode.max.xcievers):
' l4 f) `; y4 B, l: K- ^+ D: W<property>
* _5 H; ]: D+ _/ i
<name>dfs.datanode.max.transfer.threads</name> 5 q8 B8 D2 g- c# v; ~
<value>8192</value> H9 V1 w# N9 {' W
</property>
$ B& N5 N* k( v& F2 [( p E0 M拷貝到各datanode節點並重啟datanode即可
錯誤:java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.1 r6 m& X# [: S3 e) i& G# \! p6 K
2014-05-07 12:21:41,820 WARN [Thread-115] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed
4 L( [) w: N" v, K5 O1 i7 B! c( C
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010,
192.168.1.192:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.) J- d% x/ Q, c! Z9 G- W: B2 {
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:514)( g# y5 T4 q/ G) @5 [; m q
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
6 Y" m/ G+ n& c! J. C at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)$ H3 w- d4 `# X
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
1 i- e( ]) m' N# e, S at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
6 j. p. M* v8 h3 U% a2 u at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)& n: m% a% R' _% e/ ~ K1 n
at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)! @& p& G5 C8 ^+ ~; `) \+ j; k) W
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
! p6 C8 a# d: p" v. @, \; V2 S+ w: I at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:548)
0 k; n% d1 h/ k7 n3 ?6 B at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:599)9 f% ]0 i1 ^) _; b
Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[192.168.1.191:50010, 192.168.1.192:50010], original=[192.168.1.191:50010, 192.168.1.192:50010]).
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.) v! J: P9 O/ u0 G
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:860)
/ r+ ?2 c- H8 { at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:925)
* Y$ [9 y9 i! y. A) Z at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
! v1 h7 E( K$ X5 ^ at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
5 C# d0 f$ ?( A& \+ a: E at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475); h" V6 }% k4 N" k& h. I! E: q
( t1 Y# l3 R1 u! W原因:無法寫入;我的環境中有3個datanode,備份數量設定的是3。在寫操作時,它會在pipeline中寫3個機器。預設replace-datanode-on-failure.policy是DEFAULT,如果系統中的datanode大於等於3,它會找另外一個datanode來拷貝。目前機器只有3臺,因此只要一臺datanode出問題,就一直無法寫入成功。)
G8 `4 c; D7 A$ k/ A3 o7 @
+ }" U7 b* c/ S z9 K
解決辦法:修改hdfs-site.xml檔案,新增或者修改如下兩項:
* }% z' Z7 `6 b w$ M- \6 ?<property>: e' l+ F9 U4 S% {& \( t
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
; w) e' Q; R, }- K. j <value>true</value>
5 }6 c' c% }1 D; ^0 b</property>
1 ^9 D4 d; t4 ^' i, K" M) i<property>
6 ?2 g4 h" F5 ^$ ^ <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
- t- ?+ w. t2 p& ~ <value>NEVER</value>
9 H, {* ]0 z! y" W</property>7 Y, O" I# P0 z: J+ p i4 c' o: h
1 m' H: |4 S7 A7 M
對於dfs.client.block.write.replace-datanode-on-failure.enable,客戶端在寫失敗的時候,是否使用更換策略,預設是true沒有問題。
6 T4 f+ L, a4 C a/ a. t對於,dfs.client.block.write.replace-datanode-on-failure.policy,default在3個或以上備份的時候,是會嘗試更換結點嘗試寫入datanode。而在兩個備份的時候,不更換datanode,直接開始寫。對於3個datanode的叢集,只要一個節點沒響應寫入就會出問題,所以可以關掉。
錯誤:org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for
: n# b7 O/ H5 x1 l, M
14/05/08 18:24:59 INFO mapreduce.Job: Task Id : attempt_1399539856880_0016_m_000029_2, Status : FAILED: q' I9 z6 w2 W( k
Error: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1399539856880_0016_m_000029_2_spill_0.out' C/ H, v+ }, B4 H2 F
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)# r" w u4 Y( N# H2 ]# }
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
S$ ^! `- s W; D8 s5 L I( A& a at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)( I! [1 e' D4 M6 \ D4 q
at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)$ V8 D+ h# C6 G8 b* f5 y1 v" [
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)) g9 ~+ Q. @8 h& _" X6 d% o6 K
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1467): W/ \* N9 X* i9 N" }* B
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
1 w3 G1 @, c9 _& f8 g/ I" E( P at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:769)( C, C/ Y" l8 F0 H+ _* v
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
+ l: C7 \% B: p. F) f1 Z* r7 a0 D at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)) W R4 h7 o. r b% Z) Y- |
at java.security.AccessController.doPrivileged(Native Method)1 j- v' e' `: l3 Z- [8 B' i
at javax.security.auth.Subject.doAs(Subject.java:415)$ H; K, A" T$ {# Y& b. V
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)1 |. f- k0 V% ]# J: [6 a% m
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) f7 C& i8 j( m- |: H( U) U, c
) T7 t; Q8 U8 ]+ P; @6 H0 V
Container killed by the ApplicationMaster./ P& L' \6 M, J2 d$ J/ b5 I7 L
* A! [1 ?6 |" d9 _0 N1 D原因:兩種可能,hadoop.tmp.dir或者data目錄儲存空間不足。
5 y0 Z/ W/ o0 C3 ]( f% @4 ?& X# g( _2 X
解決辦法:看了一下我的dfs狀態,data使用率不到40%,所以推測是hadoop.tmp.dir空間不足,導致無法建立Jog臨時檔案。檢視core-site.xml發現沒有配置hadoop.tmp.dir,因此使用的是預設的/tmp目錄,在這目錄一旦伺服器重啟資料就會丟失,因此需要修改。新增:
5 x5 }5 K8 V" |/ z Y! j. z/ A8 r. O, v0 Z<property>. F4 a" v2 [- K3 ~
<name>hadoop.tmp.dir</dir>8 [) P Z6 r- l7 ]4 b
<value>/data/tmp</value>
* S5 k$ C; @9 s3 {% _( x</property># s! y% H2 F q9 `. ~. m
然後重新格式化:hadoop namenode -format
4 p" T1 S& s7 o- F* c: R% r重啟。
2014-06-19 10:00:32,181 INFO [org.apache.hadoop.mapred.MapTask] - Ignoring exception during close for [email protected]( X: i6 h$ U: Z6 z. y5 Y& u/ `) f
java.io.IOException: Spill failed
3 W, `! J" @2 w, S at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540)
$ M' f1 i0 K# Z$ z/ e% B1 z; m at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)" x* C& E) p- t
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)$ b( @; I* H4 D8 U$ [ ]
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1997)
& P6 T' a) S7 i" x3 b2 N) K$ V at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
: t. r! h6 }, L1 o; X! g* T6 T: X1 ` at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) T" ~* z- u+ ~
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)! e/ ~. [3 E4 \; Y* L
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
) `2 c* v C) l9 o, n- N z at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)1 r! t, I. V: x) g% x/ H
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
! Y: j( F% G! ^! E- d- V5 ] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
0 H7 m; d) k m) q( [: Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
I1 b( z1 H. B7 N0 V6 Q at java.lang.Thread.run(Thread.java:722)
% g& i4 A' l" R0 i, z' e& hCaused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
6 B# F) X; @( U; d) N at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
5 T y8 U; n; `# F at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
1 M @9 V0 F. A: { at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)& Q. D9 N/ C0 k/ l7 h
at org.apache.hadoop.mapred.MROutputFiles.getSpillFileForWrite(MROutputFiles.java:146)
" m4 {3 s9 D' J& C/ S- C& W at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
5 p2 Z4 B+ R5 i5 E4 W& M7 r at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852): P0 N) S. j0 o% t3 ^. Q) K
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)4 f- X I& k# [3 M" I4 M
: S9 S: D3 s* F, k5 ^: k: [: I
. `9 ^+ [8 T0 B# ~* @( F' x錯誤原因:本地磁碟空間不足非hdfs (我是在myeclipse中除錯程式,本地tmp目錄佔滿) ~6 ]6 m- i- R6 O1 x4 {+ j/ [
解決辦法:清理、增加空間
2014-06-23 10:21:01,479 INFO [IPC Server handler 3 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1403488126955_0002_m_000000_0 is : 0.308017162014-06-23 10:21:01,512 FATAL [IPC Server handler 2 on 45207]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1403488126955_0002_m_000000_0 - exited : java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for attempt_1403488126955_0002_m_000000_0_spill_53.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,513 INFO [IPC Server handler 2 on 45207] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for attempt_1403488126955_0002_m_000000_0_spill_53.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,514 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1403488126955_0002_m_000000_0: Error: java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1540) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1063)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:180) at com.mediadc.hadoop.MediaIndex$SecondMapper.map(MediaIndex.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for attempt_1403488126955_0002_m_000000_0_spill_53.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1573)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:852) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1510)2014-06-23 10:21:01,516 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1403488126955_0002_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP8 T- ?6 g: ~/ K4 p# R B+ F
錯誤很明顯,磁碟空間不足,但鬱悶的是,進各節點檢視,磁碟空間使用不到40%,還有很多空間。
4 w* ?+ R: q. x6 Y) e9 a# S) z! ?' l& Q$ _- { W2 k
鬱悶很長時間才發現,原來有個map任務執行時輸出比較多,執行出錯前,硬碟空間一路飆升,直到100%不夠時報錯。隨後任務執行失敗,釋放空間,把任務分配給其它節點。正因為空間被釋放,因此雖然報空間不足的錯誤,但檢視當時磁碟還有很多剩餘空間。
- @# h6 ~& F4 ]! ^ t& a' t# f; Y6 @& W' g& A, F. O+ s" l8 _
這個問題告訴我們,執行過程中的監控很重要。
相關推薦
Hadoop常見錯誤及解決辦法彙總
錯誤一:java.io.IOException: Incompatible clusterIDs 時常出現在namenode重新格式化之後' V0 h# C5 a8 o+ ^1 n* w- t k( ]$ }+ p6 d G2014-04-29 14:32:53,877
在myeclipse中使用maven前常見錯誤及解決辦法
eclips aam lai loj yate ren err jar ued %E4%BD%BF%E7%94%A8CHttpFile%E4%BB%8E%E6%9C%8D%E5%8A%A1%E5%99%A8%E7%AB%AF%E6%AD%A3%E7%A1%AE%E7%9A%
MVC MVC常見錯誤及解決辦法
.cn log entity ash cnblogs vid services ron strong MVC常見錯誤及解決辦法 問題1: 必須添加對程序集“EntityFramework, Version=5.0.0.0, Culture=neutral,
Nginx常見錯誤及解決辦法
1、Nginx 常見啟動錯誤 有的時候初次安裝nginx的時候會報這樣的錯誤 sbin/nginx -c conf/nginx.conf 報錯內容:sbin/nginx: error while&nbs
loadrunner的幾種常見錯誤及解決辦法
1、錄製loadrunner指令碼時,無法生成指令碼,錄製過程中事物都有顯示,我把所有防火牆和網路保護都關了,但是指令碼就是無法生成,協議之類的都是對的。 解決: 在錄製選項option->network-> ort Map
ios 常見錯誤及解決辦法(不定時更新)
這類錯誤是因為將專案拷貝到新的電腦造成的錯誤(原因是專案名稱不同造成的),解決辦法:更改Build Setting中的專案名稱就好了。 Build Setting ->Product Name 未完待續。。。。
laravel本地專案常見錯誤及解決辦法
1.nginx報錯 require(/data/www/dongmeiwei.com/bootstrap/../vendor/autoload.php): failed to open stream: No such file or directory in /data/w
SQL安裝常見錯誤及解決辦法
錯誤1:安裝sql2008出現不是有效的安裝資料夾解決方法:找到安裝檔案所在的資料夾,然後點選裡面的“Setup.exe”進行安裝,便可以安裝了。之前提示安裝路徑的原因我在安裝時,點選的是開始選單欄裡的“安裝中心”進行安裝的,所以出現了提示資訊。錯誤2:sql server2
VC6.0編譯器常見錯誤及解決辦法
1.建立一個Win32 Application,建立一個C++ souce file,輸入以下程式碼: #include <windows.h> #include <stdio.h> int main() { pr
Altium Designer編譯常見錯誤及解決辦法
錯誤型別 Fatal error:重大錯誤; Error:錯誤; &
Oracle常見錯誤及解決方案彙總
(原創作者:陳玓玏) 這篇文章主要是記錄一些工作中常常會碰到的錯誤跟解決方案,彙總到一起,方便查詢。 1、 錯誤提示ORA-00933: SQL command not properly ended in? 可能的原因: 1)語句寫的順序不正確,
linux下安裝oracle10g常見錯誤及解決辦法
linux下安裝oracle,在安裝自檢過程中可能會有一系列錯誤和警告出現,以下是我安裝過程中遇到的一些問題,以及解決辦法。 一、 Checking Network Configuration requirements ... 不能通過 可能錯誤原因1: 未指定固定IP
啟動Pro/TOOLKIT程式的常見錯誤及解決辦法
一、發生讀取錯誤’No such file or directory’ 錯誤原因:註冊檔案(一般為protk.dat)中exec_file指定的dll路徑不正確。 解決辦法:將註冊檔案中exec_file指定的路徑修改正確。 可以通過【輔助應用程式視窗
DB2常見錯誤及解決辦法
建立資料庫的時候,報42704錯誤。如: Sql程式碼 =>createdatabase test =>SQL0204N "SYSTEM_1386_US"is an undefined name. SQLSTATE=42704 =>create database tes
Oracle Data Pump 工具系列:Data Pump 許可權配置相關錯誤及解決辦法彙總
與 Data Pump 許可權相關的錯誤及解決辦法: 示例語句: > expdp scott/tiger DIRECTORY=my_dir DUMPFILE=exp_s.dmp \ LOGFILE=exp_s.log SCHEMAS=scott 錯誤1: UDE-00008: operation ge
CSS常見相容性問題及解決辦法彙總
我們都知道,不同版本瀏覽器對css的解析是有些分別的,特別是IE6,和IE7.雖然現在使用老版本的人數不多,但是還是有部分人在使用,我們並不能完全忽略這群使用者。如下所示:我們還是應該瞭解一下這些瀏覽器相容問題。 問題一:在IE6元素浮動,如果寬度需要內容
net-snmp常見的兩個錯誤及解決辦法
第一個: Warning: noaccess control information configured. It's unlikely this agent can serve any usefulpurpose in this state. Run "snmpc
QT、VS常見bug及解決辦法(四)——錯誤 1 error LNK2019: 無法解析的外部符號
問題1:無法解析的外部符號 "void __cdecl cv::fastFree(void *)" 錯誤 1 error LNK2019: 無法解析的外部符號 "void __cdecl cv::fastFree(void *)" ([email protecte
QT、VS常見bug及解決辦法(二)——VS錯誤 2 error LNK1120: 1 個無法解析的外部命令
這個問題的原因是有標頭檔案,但是找不到實現。 有兩個原因: 1.只包含了標頭檔案,只有這個函式的宣告,沒有包含這個函式的實現(實現一般放在cpp檔案中的)。所以只能通過編譯,連線不成功。 2.另一個原因是函式的宣告和實現都放在標頭檔案中了,一般要把宣告放標頭檔案中,實現放在c
java web 常見異常及解決辦法
eset log const 鍵值 cannot tomcat action asp tex javax.servlet.ServletException: javax/servlet/jsp/SkipPageException 重啟tomcat, javax.serv