yarn下的mapreduce記憶體問題
阿新 • • 發佈:2018-12-15
參考
前因
使用Hadoop的streming.jar遇到問題
問題1:
18/10/13 19:40:56 INFO input.FileInputFormat: Total input files to process : 701930
18/10/13 20:04:22 INFO retry.RetryInvocationHandler: java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: GC overhead limit exceeded, while invoking ClientNamenodeProtocolTranslatorPB.getBlockLocations over 2.master.mz/192.168.10.224:8020. Trying to failover immediately.
18/10/13 20:05:04 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/admonitor/.staging/job_1539157945372_30633
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.substring(String.java:1933)
at java.util.Formatter.parse(Formatter.java:2567)
at java.util.Formatter.format(Formatter.java:2501)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:471)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
outofmemory,GC,讀取大量小檔案getBlockLocations時出錯 新增引數HADOOP_CLIENT_OPTS,該選項增加的引數,將會作用於多個命令,如fs, dfs, fsck, distcp等
HADOOP_CLIENT_OPTS= "-Xmx8192M" hadoop jar $stream_jar ...
問題2:
Container [pid=100823,containerID=container_e39_1539157945372_36692_01_000527] is running 108359680B beyond the 'PHYSICAL' memory limit. Current usage: 1.1 GB of 1 GB physical memory used; 3.1 GB of 2.1 GB virtual memory used. Killing container.
實體記憶體和虛擬記憶體不足 新增記憶體,需要注意的是需要判斷是map還是reduce過程出現的記憶體不足
-Dmapreduce.map.memory.mb=8192 \
-Dmapreduce.map.java.opts=-Xmx7168M \
-Dmapreduce.reduce.memory.mb=4096 \
-Dmapreduce.reduce.java.opts=-Xmx3072M \
關於yarn下的記憶體引數配置
引數描述
name | 預設值 | 描述 |
---|---|---|
yarn.nodemanager.resource.memory-mb | 8GB | Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB. |
yarn.nodemanager.vmem-pmem-ratio | 2.1 | 虛擬記憶體率,是佔task所用記憶體的百分比,預設值為2.1倍。 |
yarn.scheduler.minimum-allocation-mb | 1G | 單個container可申請的最小與最大記憶體 |
yarn.scheduler.maximum-allocation-mb | 8G | |
mapreduce.map.memory.mb | 設定container大小 | |
mapreduce.reduce.memory.mb | ||
mapreduce.map.java.opts | 設定container啟動jvm相關引數,比memory.mb小,一般設定為0.75倍的memory.mb | |
mapreduce.reduce.java.opts |