如何判定並設定YARN 和MapReduce 記憶體
轉載自:http://blog.csdn.net/youngqj/article/details/47315167
手動計算YARN 和MapReduce的 記憶體
This section describes how to manually calculate YARN and MapReduce memory allocation settings based on the node hardware specifications.
YARN takes into account all of the available resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, CPU, etc.).
In a Hadoop cluster, it is vital to balance the usage of memory (RAM), processors (CPU cores) and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.
When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:
-
RAM (Amount of memory) 總記憶體數
-
CORES (Number of CPU cores) CPU 核心數
-
DISKS (Number of disks) 硬碟數
The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes, such as HBase.
Reserved Memory = Reserved for stack memory + Reserved for HBase memory (If HBase is on the same node)
Use the following table to determine the Reserved Memory per node.
Reserved Memory Recommendations
Total Memory per Node | Recommended Reserved System Memory | Recommended Reserved HBase Memory |
4 GB | 1 GB | 1 GB |
8 GB | 2 GB | 1 GB |
16 GB | 2 GB | 2 GB |
24 GB | 4 GB | 4 GB |
48 GB | 6 GB | 8 GB |
64 GB | 8 GB | 8 GB |
72 GB | 8 GB | 8 GB |
96 GB | 12 GB | 16 GB |
128 GB | 24 GB | 24 GB |
256 GB | 32 GB | 32 GB |
512 GB | 64 GB | 64 GB |
The next calculation is to determine the maximum number of Containers allowed per node. The following formula can be used:
# of Containers = minimum of (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)
Where MIN_CONTAINER_SIZE is the minimum Container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memory nodes, the minimum Container size should also be smaller. The following table outlines the recommended values:
Total RAM per Node | Recommended Minimum Container Size |
Less than 4 GB | 256 MB |
Between 4 GB and 8 GB | 512 MB |
Between 8 GB and 24 GB | 1024 MB |
Above 24 GB | 2048 MB |
The final calculation is to determine the amount of RAM per container:
RAM-per-Container = maximum of (MIN_CONTAINER_SIZE, (Total Available RAM) / Containers))
With these calculations, the YARN and MapReduce configurations can be set:
Configuration File | Configuration Setting | Value Calculation |
yarn-site.xml | yarn.nodemanager.resource.memory-mb | = Containers * RAM-per-Container |
yarn-site.xml | yarn.scheduler.minimum-allocation-mb | = RAM-per-Container |
yarn-site.xml | yarn.scheduler.maximum-allocation-mb | = containers * RAM-per-Container |
mapred-site.xml | mapreduce.map.memory.mb | = RAM-per-Container |
mapred-site.xml | mapreduce.reduce.memory.mb | = 2 * RAM-per-Container |
mapred-site.xml | mapreduce.map.java.opts | = 0.8 * RAM-per-Container |
mapred-site.xml | mapreduce.reduce.java.opts | = 0.8 * 2 * RAM-per-Container |
yarn-site.xml (check) | yarn.app.mapreduce.am.resource.mb | = 2 * RAM-per-Container |
yarn-site.xml (check) | yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * RAM-per-Container |
Note: After installation, both yarn-site.xml
and mapred-site.xml
are located in the /etc/hadoop/conf
folder.
例子
Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.
Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase
Min Container size = 2 GB
If there is no HBase:
# of Containers = minimum of (2*12, 1.8* 12, (48-6)/2) = minimum of (24, 21.6, 21) = 21
RAM-per-Container = maximum of (2, (48-6)/21) = maximum of (2, 2) = 2
Configuration | Value Calculation |
yarn.nodemanager.resource.memory-mb | = 21 * 2 = 42*1024 MB |
yarn.scheduler.minimum-allocation-mb | = 2*1024 MB |
yarn.scheduler.maximum-allocation-mb | = 21 * 2 = 42*1024 MB |
mapreduce.map.memory.mb | = 2*1024 MB |
mapreduce.reduce.memory.mb | = 2 * 2 = 4*1024 MB |
mapreduce.map.java.opts | = 0.8 * 2 = 1.6*1024 MB |
mapreduce.reduce.java.opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
yarn.app.mapreduce.am.resource.mb | = 2 * 2 = 4*1024 MB |
yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
If HBase is included:
# of Containers = minimum of (2*12, 1.8* 12, (48-6-8)/2) = minimum of (24, 21.6, 17) = 17
RAM-per-Container = maximum of (2, (48-6-8)/17) = maximum of (2, 2) = 2
Configuration | Value Calculation |
yarn.nodemanager.resource.memory-mb | = 17 * 2 = 34*1024 MB |
yarn.scheduler.minimum-allocation-mb | = 2*1024 MB |
yarn.scheduler.maximum-allocation-mb | = 17 * 2 = 34*1024 MB |
mapreduce.map.memory.mb | = 2*1024 MB |
mapreduce.reduce.memory.mb | = 2 * 2 = 4*1024 MB |
mapreduce.map.java.opts | = 0.8 * 2 = 1.6*1024 MB |
mapreduce.reduce.java.opts | = 0.8 * 2 * 2 = 3.2*1024 MB |
yarn.app.mapreduce.am.resource.mb | = 2 * 2 = 4*1024 MB |
yarn.app.mapreduce.am.command-opts | = 0.8 * 2 * 2 = 3.2*1024 MB |