Yarn在Shuffle階段記憶體不足問題(error in shuffle in fetcher)

Yarn在Shuffle階段記憶體不足問題(error in shuffle in fetcher)



在Hadoop叢集(CDH4.4, Mv2即Yarn框架)使用過程中,發現處理大資料集時程式報出如下錯誤:

13/12/02 20:02:06 INFO mapreduce.Job: map 100% reduce 2%

13/12/02 20:02:18 INFO mapreduce.Job: Task Id : attempt_1385983958793_0001_r_000000_1, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)

Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:360)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:154)




  1. reduce任務每次嘗試都失敗了,失敗後重新開始;
  2. reduce任務失敗累計4次後整個Application退出,應該是設定了最大重試次數之類的配置項。
  3. map任務與reduce任務是隔離的,之間不會干擾。這個從map、reduce任務原理也可以瞭解到。

基於這一點,首先查詢到map-site.xml中的配置項mapreduce.reduce.maxattempts,表示Reduce Task最大失敗嘗試次數,這個配置預設是4,調整到400後接著嘗試。


<groupId >org.apache.hadoop</ groupId>
<artifactId >hadoop-mapreduce -client-core</ artifactId>
<version >2.0.0-cdh4.4.0</ version>


// Shuffle


// Get the location for the map output – either in-memory or on-disk
mapOutput = merger.reserve(mapId, decompressedLength, id );



shell> cd /data/1/mrlocal/yarn/local/usercache/hdfs/appcache/application_1385983958793_0001/output
shell>du -sh * | grep _r_
7.3G attempt_1385983958793_0001_r_000000_1
6.5G attempt_1385983958793_0001_r_000000_12
5.2G attempt_1385983958793_0001_r_000000_5
5.8G attempt_1385983958793_0001_r_000000_7


if (!canShuffleToMemory(requestedSize)) {
LOG.info(mapId + “: Shuffling to disk since ” + requestedSize +
” is greater than maxSingleShuffleLimit (” +
maxSingleShuffleLimit + “)” );
return new OnDiskMapOutput<K,V>(mapId, reduceId, this , requestedSize,
jobConf, mapOutputFile , fetcher, true);


private boolean canShuffleToMemory( long requestedSize) {
return (requestedSize < maxSingleShuffleLimit);


this.maxSingleShuffleLimit =
(long)( memoryLimit * singleShuffleMemoryLimitPercent);


// Allow unit tests to fix Runtime memory
this. memoryLimit =
(long)(jobConf.getLong(MRJobConfig. REDUCE_MEMORY_TOTAL_BYTES,
Math. min(Runtime.getRuntime ().maxMemory(), Integer.MAX_VALUE))
* maxInMemCopyUse);

final float singleShuffleMemoryLimitPercent =

singleShuffleMemoryLimitPercent 取的是mapreduce.reduce.shuffle.memory.limit.percent這個配置的取值,官網給出的解釋是:

Expert: Maximum percentage of the in-memory limit that a single shuffle can consume








Yarn在Shuffle階段記憶體不足問題(error in shuffle in fetcher)

