分散式資源排程框架 ——YARN
阿新 • • 發佈:2018-11-06
1 YARN 產生背景
- MapReduce1.x 存在的問題:單點故障和 節點壓力大不易擴充套件;
- Hadoop1.x 時,MapReduce -> Master/Slave 架構,1個 JobTracker 帶多個 TaskTracker
- JobTracker : 負責資源管理和作業排程
- TaskTracker: 定期向 JT 彙報 本節點的健康狀況、資源使用情況、作業執行情況;接受來自JT 的命令——啟動任務
- YARN:不同計算框架可以共享同一個 HDFS 叢集上的資料,享受整體的資源排程
2 YARN 的架構
http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/hadoop-yarn/hadoop-yarn-site/YARN.html
- ResourceManager:RM,整個叢集同一時間提供服務的RM只有一個,負責叢集資源的統一管理和排程,處理客戶端的請求——提交一個作業,殺死一個作業;監控NM,一旦某個NM掛了,那麼該 NM 上執行的任務需要告訴 AM;
- NodeManager:NM,整個叢集有多個,負責本節點資源管理和使用,定時向 RM 彙報本節點的資源使用情況;接收並處理來自 RM 的各種命令:啟動 Container; 處理來自 AM 的命令;單個節點的資源管理
- **ApplicationMaster **: AM,負責應用程式的管理,每個應用程式對應一個:MR,Spark;為應用程式向 RM 申請資源(core,memory),分配給內部的 task;需要與 NM 通訊:啟動/停止 task,task 是執行在 container 裡面, AM也是執行在 container裡面;
- Container:封裝了CPU,Memory 等資源的一個容器,是一個任務執行環境的抽象
- Client:提交作業,檢視進度
3 YARN 環境搭建
3.1 mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
3.2 yarn-site.xml
<property>
<name> yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
3.3 啟動 YARN
[[email protected] ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-node1.out
node1: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-node1.out
瀏覽器訪問 http://node1:8088
4 提交 MapReduce 作業到 YARN
自帶案例 /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce2
hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
[[email protected] mapreduce2]$ hadoop jar
RunJar jarFile [mainClass] args...
[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[[email protected] mapreduce2]$
[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi
Usage: org.apache.hadoop.examples.QuasiMonteCarlo <nMaps> <nSamples>
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
[[email protected] mapreduce2]$
[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
Number of Maps = 2
Samples per Map = 3
18/10/29 22:19:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
18/10/29 22:19:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/29 22:19:03 INFO input.FileInputFormat: Total input paths to process : 2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: number of splits:2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540822729980_0001
18/10/29 22:19:04 INFO impl.YarnClientImpl: Submitted application application_1540822729980_0001
18/10/29 22:19:04 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1540822729980_0001/
18/10/29 22:19:04 INFO mapreduce.Job: Running job: job_1540822729980_0001
18/10/29 22:19:16 INFO mapreduce.Job: Job job_1540822729980_0001 running in uber mode : false
18/10/29 22:19:16 INFO mapreduce.Job: map 0% reduce 0%
18/10/29 22:19:26 INFO mapreduce.Job: map 50% reduce 0%
18/10/29 22:19:27 INFO mapreduce.Job: map 100% reduce 0%
18/10/29 22:19:32 INFO mapreduce.Job: map 100% reduce 100%
18/10/29 22:19:33 INFO mapreduce.Job: Job job_1540822729980_0001 completed successfully
18/10/29 22:19:33 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=335472
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=522
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=15859
Total time spent by all reduces in occupied slots (ms)=4321
Total time spent by all map tasks (ms)=15859
Total time spent by all reduce tasks (ms)=4321
Total vcore-seconds taken by all map tasks=15859
Total vcore-seconds taken by all reduce tasks=4321
Total megabyte-seconds taken by all map tasks=16239616
Total megabyte-seconds taken by all reduce tasks=4424704
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=286
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=245
CPU time spent (ms)=1260
Physical memory (bytes) snapshot=458809344
Virtual memory (bytes) snapshot=8175378432
Total committed heap usage (bytes)=262033408
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 30.938 seconds
Estimated value of Pi is 4.00000000000000000000
[[email protected] mapreduce2]$