hadoop 配置機架感知
周海漢 2013.7.24
http://abloz.com
假如裝置連結層次分3層,第一層交換機d1下面連多個交換機rk1,rk2,rk3,rk4,…. 每個交換機對應一個機架。
d1(rk1(hs11,hs12,…),rk2(hs21,hs22,…), rk3(hs31,hs32,…),rk4(hs41,hs42,…),…)
可以用程式或指令碼完成由host到裝置的對映。比如,用python,生成一個topology.py:
然後在core-site.xml中配置
python機架指令碼:
[[email protected] conf]$ cat topology.py #!/usr/bin/env python
’’’ This script used by hadoop to determine network/rack topology. It should be specified in hadoop-site.xml via topology.script.file.name Property. topology.script.file.name /home/hadoop/hadoop-1.1.2/conf/topology.py
To generate dict: for i in range(xx): #print ““hs%d”:”/rk%d/hs%d”,”%(i,(i-1)/10,i)
print ““hs%d”:”/rk%d”,”%(i,(i-1)/10)
Andy 2013.7.23 ‘’’
import sys from string import join
DEFAULT_RACK = ‘/rk0’;
RACK_MAP = { “hs11”:”/rk1”, “hs12”:”/rk1”, “hs13”:”/rk1”, “hs14”:”/rk1”, “hs15”:”/rk1”, “hs16”:”/rk1”, “hs17”:”/rk1”, “hs18”:”/rk1”, “hs19”:”/rk1”, “hs20”:”/rk1”, “hs21”:”/rk2”, “hs22”:”/rk2”, “hs23”:”/rk2”, “hs24”:”/rk2”, “hs25”:”/rk2”, “hs26”:”/rk2”, “hs27”:”/rk2”, “hs28”:”/rk2”, “hs29”:”/rk2”, “hs30”:”/rk2”, “hs31”:”/rk3”, “hs32”:”/rk3”, “hs33”:”/rk3”, “hs34”:”/rk3”, “hs35”:”/rk3”, “hs36”:”/rk3”, “hs37”:”/rk3”, “hs38”:”/rk3”, “hs39”:”/rk3”, “hs40”:”/rk3”, “hs41”:”/rk4”, “hs42”:”/rk4”, “hs43”:”/rk4”, “hs44”:”/rk4”, “hs45”:”/rk4”, “hs46”:”/rk4”,
…
“10.10.20.11”:”/rk1”, “10.10.20.12”:”/rk1”, “10.10.20.13”:”/rk1”, “10.10.20.14”:”/rk1”, “10.10.20.15”:”/rk1”, “10.10.20.16”:”/rk1”, “10.10.20.17”:”/rk1”, “10.10.20.18”:”/rk1”, “10.10.20.19”:”/rk1”, “10.10.20.20”:”/rk1”, “10.10.20.21”:”/rk2”, “10.10.20.22”:”/rk2”, “10.10.20.23”:”/rk2”, “10.10.20.24”:”/rk2”, “10.10.20.25”:”/rk2”, “10.10.20.26”:”/rk2”, “10.10.20.27”:”/rk2”, “10.10.20.28”:”/rk2”, “10.10.20.29”:”/rk2”, “10.10.20.30”:”/rk2”, “10.10.20.31”:”/rk3”, “10.10.20.32”:”/rk3”, “10.10.20.33”:”/rk3”, “10.10.20.34”:”/rk3”, “10.10.20.35”:”/rk3”, “10.10.20.36”:”/rk3”, “10.10.20.37”:”/rk3”, “10.10.20.38”:”/rk3”, “10.10.20.39”:”/rk3”, “10.10.20.40”:”/rk3”, “10.10.20.41”:”/rk4”, “10.10.20.42”:”/rk4”, “10.10.20.43”:”/rk4”, “10.10.20.44”:”/rk4”, “10.10.20.45”:”/rk4”, “10.10.20.46”:”/rk4”,
… }
if len(sys.argv)==1: print DEFAULT_RACK else: print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]],” “)
原來這個程式我返回的是
“hs11”:”/rk1/hs11”,
結果執行mapreduce程式時報如下錯誤:
Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there’s no reduce operator Starting Job = job_201307241502_0003, Tracking URL = http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003 Kill Command = /home/hadoop/hadoop-1.1.2/libexec/../bin/hadoop job -kill job_201307241502_0003 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2013-07-24 18:38:11,854 Stage-1 map = 100%, reduce = 100% Ended Job = job_201307241502_0003 with errors Error during job, obtaining debugging information… Job Tracking URL: http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec
通過http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0002 可以看到:
Job initialization failed:
java.lang.NullPointerException
at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2751) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:578) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:750)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
原來系統在配置機架敏感時,並不需要在指令碼中返回裝置ns或hostname,系統會自動新增。改為上面的topology.py後,系統執行正確。
如非註明轉載, 均為原創. 本站遵循知識共享CC協議,轉載請註明來源