1. 程式人生 > >Hadoop 常見問題集錦

Hadoop 常見問題集錦

頂,這樣的貼子非常好,要置頂。附件是由Hadoop技術交流群中若冰的同學提供的相關資料:
(12.58 KB)
Hadoop新增節點的方法
自己實際新增節點過程:
1. 先在slave上配置好環境,包括ssh,jdk,相關config,lib,bin等的拷貝;
2. 將新的datanode的host加到叢集namenode及其他datanode中去;
3. 將新的datanode的ip加到master的conf/slaves中;
4. 重啟cluster,在cluster中看到新的datanode節點;
5. 執行bin/start-balancer.sh,這個會很耗時間
備註:
1. 如果不balance,那麼cluster會把新的資料都存放在新的node上,這樣會降低mr的工作效率;
2. 也可呼叫bin/start-balancer.sh 命令執行,也可加引數 -threshold 5
   threshold 是平衡閾值,預設是10%,值越低各節點越平衡,但消耗時間也更長。
3. balancer也可以在有mr job的cluster上執行,預設dfs.balance.bandwidthPerSec很低,為1M/s。在沒有mr job時,可以提高該設定加快負載均衡時間。

其他備註:
1. 必須確保slave的firewall已關閉;
2. 確保新的slave的ip已經新增到master及其他slaves的/etc/hosts中,反之也要將master及其他slave的ip新增到新的slave的/etc/hosts中
mapper及reducer個數

url地址: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
HowManyMapsAndReduces
Partitioning your job into maps and reduces
Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. At one extreme is the 1 map/1 reduce case where nothing is distributed. The other extreme is to have 1,000,000 maps/ 1,000,000 reduces where the framework runs out of resources for the overhead.
Number of Maps
The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps.
The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
Number of Reduces
The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.
The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf.setNumReduceTasks(int num).
自己的理解:
mapper個數的設定:跟input file 有關係,也跟filesplits有關係,filesplits的上線為dfs.block.size,下線可以通過mapred.min.split.size設定,最後還是由InputFormat決定。

較好的建議:
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.
< property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>2</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
< /property>

單個node新加硬碟

1.修改需要新加硬碟的node的dfs.data.dir,用逗號分隔新、舊檔案目錄
2.重啟dfs

同步hadoop 程式碼
hadoop-env.sh
# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合併HDFS小檔案
hadoop fs -getmerge <src> <dest>

重啟reduce job方法
Introduced recovery of jobs when JobTracker restarts. This facility is off by default.
Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".
還未驗證過。

IO寫操作出現問題

0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/
172.16.100.165:50010 remote=/172.16.100.165:50930]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
        at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.

解決辦法:在hadoop-site.xml中設定dfs.datanode.socket.write.timeout=0試試;
My understanding is that this issue should be fixed in Hadoop 0.19.1 so that
we should leave the standard timeout. However until then this can help
resolve issues like the one you're seeing.

HDFS退服節點的方法
目前版本的dfsadmin的幫助資訊是沒寫清楚的,已經file了一個bug了,正確的方法如下:
1. 將 dfs.hosts 置為當前的 slaves,檔名用完整路徑,注意,列表中的節點主機名要用大名,即 uname -n 可以得到的那個。
2. 將 slaves 中要被退服的節點的全名列表放在另一個檔案裡,如 slaves.ex,使用 dfs.host.exclude 引數指向這個檔案的完整路徑
3. 執行命令 bin/hadoop dfsadmin -refreshNodes
4. web介面或 bin/hadoop dfsadmin -report 可以看到退服節點的狀態是 Decomission in progress,直到需要複製的資料複製完成為止
5. 完成之後,從 slaves 裡(指 dfs.hosts 指向的檔案)去掉已經退服的節點

附帶說一下 -refreshNodes 命令的另外三種用途:
2. 新增允許的節點到列表中(新增主機名到 dfs.hosts 裡來)
3. 直接去掉節點,不做資料副本備份(在 dfs.hosts 裡去掉主機名)
4. 退服的逆操作——停止 exclude 裡面和 dfs.hosts 裡面都有的,正在進行 decomission 的節點的退服,也就是把 Decomission in progress 的節點重新變為 Normal (在 web 介面叫 in service)

hadoop 學習借鑑
1. 解決hadoop OutOfMemoryError問題:
<property>
   <name>mapred.child.java.opts</name>
   <value>-Xmx800M -server</value>
< /property>
With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.
或者:hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

2. Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
when i use nutch1.0,get this error:
Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
這個也很好解決:
可以刪除conf/log4j.properties,然後可以看到詳細的錯誤報告
我這兒出現的是out of memory
解決辦法是在給執行主類org.apache.nutch.crawl.Crawl加上引數:-Xms64m -Xmx512m
你的或許不是這個問題,但是能看到詳細的錯誤報告問題就好解決了

distribute cache使用
類似一個全域性變數,但是由於這個變數較大,所以不能設定在config檔案中,轉而使用distribute cache
具體使用方法:(詳見《the definitive guide》,P240)
1. 在命令列呼叫時:呼叫-files,引入需要查詢的檔案(可以是local file, HDFS file(使用hdfs://xxx?)), 或者 -archives (JAR,ZIP, tar等)
% hadoop jar job.jar MaxTemperatureByStationNameUsingDistributedCacheFile /
  -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output
2. 程式中呼叫:
   public void configure(JobConf conf) {
      metadata = new NcdcStationMetadata();
      try {
        metadata.initialize(new File("stations-fixed-width.txt"));
      } catch (IOException e) {
        throw new RuntimeException(e);
      }
   }
另外一種間接的使用方法:在hadoop-0.19.0中好像沒有
呼叫addCacheFile()或者addCacheArchive()新增檔案,
使用getLocalCacheFiles() 或 getLocalCacheArchives() 獲得檔案

hadoop的job顯示web
There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system. By default, these are located at [WWW] http://job.tracker.addr:50030/ and [WWW] http://name.node.addr:50070/.

hadoop監控
OnlyXP(52388483) 131702
用nagios作告警,ganglia作監控圖表即可

status of 255 error
錯誤型別:
java.io.IOException: Task process exit with nonzero status of 255.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

錯誤原因:
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure

split size
FileInputFormat input splits: (詳見 《the definitive guide》P190)
mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.
mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.
dfs.block.size: default = 64M, 系統中設定為128M。
如果設定 minimum split size > block size, 會增加塊的數量。(猜想從其他節點拿去資料的時候,會合並block,導致block數量增多)
如果設定maximum split size < block size, 會進一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));
其中 minimumSize < blockSize < maximumSize.

sort by value
hadoop 不提供直接的sort by value方法,因為這樣會降低mapreduce效能。
但可以用組合的辦法來實現,具體實現方法見《the definitive guide》, P250
基本思想:
1. 組合key/value作為新的key;
2. 過載partitioner,根據old key來分割;
conf.setPartitionerClass(FirstPartitioner.class);
3. 自定義keyComparator:先根據old key排序,再根據old value排序;
conf.setOutputKeyComparatorClass(KeyComparator.class);
4. 過載GroupComparator, 也根據old key 來組合;  conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的處理
對於一系列的small files作為input file,會降低hadoop效率。
有3種方法可以將small file合併處理:
1. 將一系列的small files合併成一個sequneceFile,加快mapreduce速度。
詳見WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194
2. 使用CombineFileInputFormat整合FileinputFormat,但是未實現過;
3. 使用hadoop archives(類似打包),減少小檔案在namenode中的metadata記憶體消耗。(這個方法不一定可行,所以不建議使用)
   方法:
   將/my/files目錄及其子目錄歸檔成files.har,然後放在/my目錄下
   bin/hadoop archive -archiveName files.har /my/files /my
   
   檢視files in the archive:
   bin/hadoop fs -lsr har://my/files.har

skip bad records
JobConf conf = new JobConf(ProductMR.class);
conf.setJobName("ProductMR");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Product.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setMapOutputCompressorClass(DefaultCodec.class);
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
String objpath = "abc1";
SequenceFileInputFormat.addInputPath(conf, new Path(objpath));
SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);
SkipBadRecords.setAttemptsToStartSkipping(conf, 0);
SkipBadRecords.setSkipOutputPath(conf, new Path("data/product/skip/"));
String output = "abc";
SequenceFileOutputFormat.setOutputPath(conf, new Path(output));
JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 單個datanode
如果一個datanode 出現問題,解決之後需要重新加入cluster而不重啟cluster,方法如下:
bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start jobtracker

reduce exceed 100%
"Reduce Task Progress shows > 100% when the total size of map outputs (for a
single reducer) is high "
造成原因:
在reduce的merge過程中,check progress有誤差,導致status > 100%,在統計過程中就會出現以下錯誤:java.lang.ArrayIndexOutOfBoundsException: 3
        at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)
        at org.apache.hadoop.mapred.StatusHttpServer$TaskGraphServlet.doGet(StatusHttpServer.java:159)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)

jira地址:

counters
3中counters:
1. built-in counters: Map input bytes, Map output records...
2. enum counters
   呼叫方式:
  enum Temperature {
    MISSING,
    MALFORMED
  }

reporter.incrCounter(Temperature.MISSING, 1)
   結果顯示:
09/04/20 06:33:36 INFO mapred.JobClient:   Air Temperature Recor
09/04/20 06:33:36 INFO mapred.JobClient:     Malformed=3
09/04/20 06:33:36 INFO mapred.JobClient:     Missing=66136856
3. dynamic countes:
   呼叫方式:
   reporter.incrCounter("TemperatureQuality", parser.getQuality(),1);
   
   結果顯示:
09/04/20 06:33:36 INFO mapred.JobClient:   TemperatureQuality
09/04/20 06:33:36 INFO mapred.JobClient:     2=1246032
09/04/20 06:33:36 INFO mapred.JobClient:     1=973422173
09/04/20 06:33:36 INFO mapred.JobClient:     0=1